46 results on '"Nelle Varoquaux"'
Search Results
2. Genome-resolved metagenomics reveals role of iron metabolism in drought-induced rhizosphere microbiome dynamics
- Author
-
Ling Xu, Zhaobin Dong, Dawn Chiniquy, Grady Pierroz, Siwen Deng, Cheng Gao, Spencer Diamond, Tuesday Simmons, Heidi M.-L. Wipf, Daniel Caddell, Nelle Varoquaux, Mary A. Madera, Robert Hutmacher, Adam Deutschbauer, Jeffery A. Dahlberg, Mary Lou Guerinot, Elizabeth Purdom, Jillian F. Banfield, John W. Taylor, Peggy G. Lemaux, and Devin Coleman-Derr
- Subjects
Science - Abstract
Advances in omics provide a tool to understand mechanisms for plant–microbial interactions under stress. Here the authors apply genome-resolved metagenomics to investigate sorghum and its microbiome responses to drought, identifying an unexpected role of iron metabolism.
- Published
- 2021
- Full Text
- View/download PDF
3. Cell Wall Compositions of Sorghum bicolor Leaves and Roots Remain Relatively Constant Under Drought Conditions
- Author
-
Tess Scavuzzo-Duggan, Nelle Varoquaux, Mary Madera, John P. Vogel, Jeffery Dahlberg, Robert Hutmacher, Michael Belcher, Jasmine Ortega, Devin Coleman-Derr, Peggy Lemaux, Elizabeth Purdom, and Henrik V. Scheller
- Subjects
Sorghum bicolor ,drought ,cell wall ,biomass conversion and expansion factor (BCEF) ,pre-flowering ,post-flowering ,Plant culture ,SB1-1110 - Abstract
Renewable fuels are needed to replace fossil fuels in the immediate future. Lignocellulosic bioenergy crops provide a renewable alternative that sequesters atmospheric carbon. To prevent displacement of food crops, it would be advantageous to grow biofuel crops on marginal lands. These lands will likely face more frequent and extreme drought conditions than conventional agricultural land, so it is crucial to see how proposed bioenergy crops fare under these conditions and how that may affect lignocellulosic biomass composition and saccharification properties. We found that while drought impacts the plant cell wall of Sorghum bicolor differently according to tissue and timing of drought induction, drought-induced cell wall compositional modifications are relatively minor and produce no negative effect on biomass conversion. This contrasts with the cell wall-related transcriptome, which had a varied range of highly variable genes (HVGs) within four cell wall-related GO categories, depending on the tissues surveyed and time of drought induction. Further, many HVGs had expression changes in which putative impacts were not seen in the physical cell wall or which were in opposition to their putative impacts. Interestingly, most pre-flowering drought-induced cell wall changes occurred in the leaf, with matrix and lignin compositional changes that did not persist after recovery from drought. Most measurable physical post-flowering cell wall changes occurred in the root, affecting mainly polysaccharide composition and cross-linking. This study couples transcriptomics to cell wall chemical analyses of a C4 grass experiencing progressive and differing drought stresses in the field. As such, we can analyze the cell wall-specific response to agriculturally relevant drought stresses on the transcriptomic level and see whether those changes translate to compositional or biomass conversion differences. Our results bolster the conclusion that drought stress does not substantially affect the cell wall composition of specific aerial and subterranean biomass nor impede enzymatic hydrolysis of leaf biomass, a positive result for biorefinery processes. Coupled with previously reported results on the root microbiome and rhizosphere and whole transcriptome analyses of this study, we can formulate and test hypotheses on individual gene candidates’ function in mediating drought stress in the grass cell wall, as demonstrated in sorghum.
- Published
- 2021
- Full Text
- View/download PDF
4. Effective normalization for copy number variation in Hi-C data
- Author
-
Nicolas Servant, Nelle Varoquaux, Edith Heard, Emmanuel Barillot, and Jean-Philippe Vert
- Subjects
Normalization ,Hi-C ,Cancer ,Copy-number ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background Normalization is essential to ensure accurate analysis and proper interpretation of sequencing data, and chromosome conformation capture data such as Hi-C have particular challenges. Although several methods have been proposed, the most widely used type of normalization of Hi-C data usually casts estimation of unwanted effects as a matrix balancing problem, relying on the assumption that all genomic regions interact equally with each other. Results In order to explore the effect of copy-number variations on Hi-C data normalization, we first propose a simulation model that predict the effects of large copy-number changes on a diploid Hi-C contact map. We then show that the standard approaches relying on equal visibility fail to correct for unwanted effects in the presence of copy-number variations. We thus propose a simple extension to matrix balancing methods that model these effects. Our approach can either retain the copy-number variation effects (LOIC) or remove them (CAIC). We show that this leads to better downstream analysis of the three-dimensional organization of rearranged genomes. Conclusions Taken together, our results highlight the importance of using dedicated methods for the analysis of Hi-C cancer data. Both CAIC and LOIC methods perform well on simulated and real Hi-C data sets, each fulfilling different needs.
- Published
- 2018
- Full Text
- View/download PDF
5. Changes in genome organization of parasite-specific gene families during the Plasmodium transmission stages
- Author
-
Evelien M. Bunnik, Kate B. Cook, Nelle Varoquaux, Gayani Batugedara, Jacques Prudhomme, Anthony Cort, Lirong Shi, Chiara Andolina, Leila S. Ross, Declan Brady, David A. Fidock, Francois Nosten, Rita Tewari, Photini Sinnis, Ferhat Ay, Jean-Philippe Vert, William Stafford Noble, and Karine G. Le Roch
- Subjects
Science - Abstract
The development of malaria parasites is controlled by coordinated changes in gene expression. Here, the authors show that the three-dimensional genome structure of human malaria parasites is strongly connected with transcriptional activity of specific gene families throughout the life cycles of Plasmodium falciparum and Plasmodium vivax parasites.
- Published
- 2018
- Full Text
- View/download PDF
6. Inference of 3D genome architecture by modeling overdispersion of Hi-C data.
- Author
-
Nelle Varoquaux, William S. Noble, and Jean-Philippe Vert
- Published
- 2023
- Full Text
- View/download PDF
7. Inferring Diploid 3D Chromatin Structures from Hi-C Data.
- Author
-
Alexandra Gesine Cauer, Gürkan Yardimci, Jean-Philippe Vert, Nelle Varoquaux, and William Stafford Noble
- Published
- 2019
- Full Text
- View/download PDF
8. Community, Time, and (Con)text: A Dynamical Systems Analysis of Online Communication and Community Health among Open-Source Software Communities.
- Author
-
Alexandra Paxton, Nelle Varoquaux, Chris Holdgraf, and R. Stuart Geiger
- Published
- 2022
- Full Text
- View/download PDF
9. A pipeline to analyse time-course gene expression data [version 1; peer review: 2 approved with reservations]
- Author
-
Nelle Varoquaux and Elizabeth Purdom
- Subjects
Method Article ,Articles ,time-course gene expression data ,clustering ,differential expression ,workflow - Abstract
The phenotypic diversity of cells is governed by a complex equilibrium between their genetic identity and their environmental interactions: Understanding the dynamics of gene expression is a fundamental question of biology. However, analysing time-course transcriptomic data raises unique challenging statistical and computational questions, requiring the development of novel methods and software. This workflow provides a step-by-step tutorial of the methodology used to analyse time-course data: (1) quality control and normalization of the dataset; (2) differential expression analysis using functional data analysis; (3) clustering of time-course data; (4) interpreting clusters with GO term and KEGG pathway enrichment analysis. As a case study, we apply this workflow to time-course transcriptomic data from mice exposed to four strains of influenza to showcase every step of the pipeline.
- Published
- 2020
- Full Text
- View/download PDF
10. The Types, Roles, and Practices of Documentation in Data Analytics Open Source Software Libraries - A Collaborative Ethnography of Documentation Work.
- Author
-
R. Stuart Geiger, Nelle Varoquaux, Charlotte Mazel-Cabasse, and Chris Holdgraf
- Published
- 2018
- Full Text
- View/download PDF
11. A statistical approach for inferring the 3D structure of the genome.
- Author
-
Nelle Varoquaux, Ferhat Ay, William Stafford Noble, and Jean-Philippe Vert
- Published
- 2014
- Full Text
- View/download PDF
12. iced: fast and memory efficient normalization of contact maps.
- Author
-
Nelle Varoquaux and Nicolas Servant
- Published
- 2019
- Full Text
- View/download PDF
13. 8th European Conference on Python in Science (EuroSciPy 2015).
- Author
-
Nelle Varoquaux
- Published
- 2016
14. Inference of genome 3D architecture by modeling overdispersion of Hi-C data
- Author
-
Nelle Varoquaux, William Stafford Noble, Jean-Philippe Vert, Univ. Grenoble Alpes, CNRS, CHU Grenoble Alpes, Grenoble INP, TIMC-IMAG, 38000 Grenoble, France, Department of Genome Sciences [Seattle] (GS), University of Washington [Seattle], Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, Google Brain, Paris, Centre de Bioinformatique (CBIO), MINES ParisTech - École nationale supérieure des mines de Paris, Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL), Translational microbial Evolution and Engineering (TIMC-TrEE), Translational Innovation in Medicine and Complexity / Recherche Translationnelle et Innovation en Médecine et Complexité - UMR 5525 (TIMC ), VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA)-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA), Google Research [Paris], Cancer et génome: Bioinformatique, biostatistiques et épidémiologie d'un système complexe, Mines Paris - PSL (École nationale supérieure des mines de Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut Curie [Paris]-Institut National de la Santé et de la Recherche Médicale (INSERM), and Varoquaux, Nelle
- Subjects
0303 health sciences ,Computer science ,[SDV]Life Sciences [q-bio] ,Negative binomial distribution ,Inference ,Function (mathematics) ,Poisson distribution ,[SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM] ,[SDV] Life Sciences [q-bio] ,03 medical and health sciences ,symbols.namesake ,0302 clinical medicine ,Overdispersion ,symbols ,Poisson regression ,Multidimensional scaling ,Algorithm ,030217 neurology & neurosurgery ,030304 developmental biology ,Count data - Abstract
We address the challenge of inferring a consensus 3D model of genome architecture from Hi-C data. Existing approaches most often rely on a two step algorithm: first convert the contact counts into distances, then optimize an objective function akin to multidimensional scaling (MDS) to infer a 3D model. Other approaches use a maximum likelihood approach, modeling the contact counts between two loci as a Poisson random variable whose intensity is a decreasing function of the distance between them. However, a Poisson model of contact counts implies that the variance of the data is equal to the mean, a relationship that is often too restrictive to properly model count data.We first confirm the presence of overdispersion in several real Hi-C data sets, and we show that the overdispersion arises even in simulated data sets. We then propose a new model, called Pastis-NB, where we replace the Poisson model of contact counts by a negative binomial one, which is parametrized by a mean and a separate dispersion parameter. The dispersion parameter allows the variance to be adjusted independently from the mean, thus better modeling overdispersed data. We compare the results of Pastis-NB to those of several previously published algorithms: three MDS-based methods (ShRec3D, ChromSDE, and Pastis-MDS) and a statistical methods based on a Poisson model of the data (Pastis-PM). We show that the negative binomial inference yields more accurate structures on simulated data, and more robust structures than other models across real Hi-C replicates and across different resolutions.A Python implementation of Pastis-NB is available at https://github.com/hiclib/pastis under the BSD licenseSupplementary information is available at https://nellev.github.io/pastisnb/
- Published
- 2022
15. Transcriptomic analysis of field-droughted sorghum from seedling to maturity reveals biotic and metabolic responses
- Author
-
Joy Hollingsworth, John P. Vogel, Robert B. Hutmacher, Vasanth R. Singan, Axel Visel, Grady Pierroz, Benjamin J. Cole, Mary Madera, John W. Taylor, Devin Coleman-Derr, Christer Jansson, Christopher R. Baker, Jeffery A. Dahlberg, Ronan C. O'Malley, Julie A. Sievert, Stephanie DeGraaf, Peggy G. Lemaux, Krishna K. Niyogi, Matthew J. Blow, Elizabeth Purdom, Tim L. Jeffers, Ling Xu, Yuko Yoshinaga, Maria J. Harrison, Cheng Gao, Judith A Owiti, Dhruv Patel, Nelle Varoquaux, Translational Innovation in Medicine and Complexity / Recherche Translationnelle et Innovation en Médecine et Complexité - UMR 5525 (TIMC ), VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA), Génomique et Évolution des Microorganismes (TIMC-IMAG-GEM ), Techniques de l'Ingénierie Médicale et de la Complexité - Informatique, Mathématiques et Applications Grenoble - UMR 5525 (TIMC-IMAG), and Université Grenoble Alpes (UGA)-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )
- Subjects
0106 biological sciences ,[SDV]Life Sciences [q-bio] ,Drought tolerance ,Plant Biology ,arbuscular mycorrhizal fungi ,drought ,Photosynthesis ,7. Clean energy ,01 natural sciences ,Crop ,03 medical and health sciences ,S. bicolor ,Symbiosis ,parasitic diseases ,RNA-Seq ,ComputingMilieux_MISCELLANEOUS ,030304 developmental biology ,2. Zero hunger ,0303 health sciences ,Multidisciplinary ,biology ,fungi ,food and beverages ,Plant physiology ,Biological Sciences ,15. Life on land ,biology.organism_classification ,Sorghum ,PNAS Plus ,Agronomy ,Seedling ,Sweet sorghum ,010606 plant biology & botany - Abstract
Significance Understanding the molecular response of plants to drought is critical to efforts to improve agricultural yields under increasingly frequent droughts. We grew 2 cultivars of the naturally drought-tolerant food crop sorghum in the field under drought stress. We sequenced the mRNA from weekly samples of these plants, resulting in a molecular profile of drought response over the growing season. We find molecular differences in the 2 cultivars that help explain their differing tolerances to drought and evidence of a disruption in the plant’s symbiosis with arbuscular mycorrhizal fungi. Our findings are of practical importance for agricultural breeding programs, while the resulting data are a resource for the plant and microbial communities for studying the dynamics of drought response., Drought is the most important environmental stress limiting crop yields. The C4 cereal sorghum [Sorghum bicolor (L.) Moench] is a critical food, forage, and emerging bioenergy crop that is notably drought-tolerant. We conducted a large-scale field experiment, imposing preflowering and postflowering drought stress on 2 genotypes of sorghum across a tightly resolved time series, from plant emergence to postanthesis, resulting in a dataset of nearly 400 transcriptomes. We observed a fast and global transcriptomic response in leaf and root tissues with clear temporal patterns, including modulation of well-known drought pathways. We also identified genotypic differences in core photosynthesis and reactive oxygen species scavenging pathways, highlighting possible mechanisms of drought tolerance and of the delayed senescence, characteristic of the stay-green phenotype. Finally, we discovered a large-scale depletion in the expression of genes critical to arbuscular mycorrhizal (AM) symbiosis, with a corresponding drop in AM fungal mass in the plants’ roots.
- Published
- 2019
16. circHiC: circular visualization of Hi-C data and integration of genomic data
- Author
-
Ivan Junier, Nelle Varoquaux, Translational Innovation in Medicine and Complexity / Recherche Translationnelle et Innovation en Médecine et Complexité - UMR 5525 (TIMC ), VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), and Université Grenoble Alpes (UGA)
- Subjects
0303 health sciences ,biology ,Computer science ,Circular bacterial chromosome ,Genomic data ,[SDV]Life Sciences [q-bio] ,Chromosome ,Python (programming language) ,biology.organism_classification ,01 natural sciences ,Genome ,010305 fluids & plasmas ,Visualization ,03 medical and health sciences ,Tree (data structure) ,Computer graphics (images) ,0103 physical sciences ,computer ,Bacteria ,030304 developmental biology ,computer.programming_language - Abstract
SummaryGenome wide contact frequencies obtained using Hi-C-like experiments have raised novel challenges in terms of visualization and rationalization of chromosome structuring phenomena. In bacteria, display of Hi-C data should be congruent with the circularity of chromosomes. However, standard representations under the form of square matrices or horizontal bands are not adapted to periodic conditions as those imposed by (most) bacterial chromosomes. Here, we fill this gap and propose a Python library, built upon the widely used Matplotlib library, to display Hi-C data in circular strips, together with the possibility to overlay genomic data. The proposed tools are light and fast, aiming to facilitate the exploration and understanding of bacterial chromosome structuring data. The library further includes the possibility to handle linear chromosomes, providing a fresh way to display and explore eukaryotic data.Availability and implementationThe package runs under Python 3 and is freely available at https://github.com/TrEE-TIMC/circHiC. The documentation can be found at https://tree-timc.github.io/circhic/; images obtained in different organisms are provided in the gallery section and are accompanied with codes.Contactivan.junier@univ-grenoble-alpes.fr, nelle.varoquaux@univ-grenoble-alpes.fr
- Published
- 2021
17. Unveiling the links between peptide identification and differential analysis FDR controls by means of a practical introduction to knockoff filters
- Author
-
Thomas Burger, Nelle Varoquaux, Lucas Etourneau, Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA), Translational microbial Evolution and Engineering (TIMC-TrEE), Translational Innovation in Medicine and Complexity / Recherche Translationnelle et Innovation en Médecine et Complexité - UMR 5525 (TIMC ), VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA)-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Infrastructure Nationale de Protéomique, FR2048 ProFI, and ANR-19-P3IA-0003,MIAI,MIAI @ Grenoble Alpes(2019)
- Subjects
Proteomics ,Alternative methods ,Protocol (science) ,0303 health sciences ,Computer science ,computer.software_genre ,01 natural sciences ,[SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM] ,Differential analysis ,010104 statistics & probability ,03 medical and health sciences ,Identification (information) ,Data mining ,0101 mathematics ,Peptides ,computer ,Algorithms ,030304 developmental biology - Abstract
In proteomic differential analysis, FDR control is often performed through a multiple test correction (i.e., the adjustment of the original p-values). In this protocol, we apply a recent and alternative method, based on so-called knockoff filters. It shares interesting conceptual similarities with the target-decoy competition procedure, classically used in proteomics for FDR control at peptide identification. To provide practitioners with a unified understanding of FDR control in proteomics, we apply the knockoff procedure on real and simulated quantitative datasets. Leveraging these comparisons, we propose to adapt the knockoff procedure to better fit the specificities of quantitive proteomic data (mainly very few samples). Performances of knockoff procedure are compared with those of the classical Benjamini-Hochberg procedure, hereby shedding a new light on the strengths and weaknesses of target-decoy competition.
- Published
- 2021
18. Unveiling the Links Between Peptide Identification and Differential Analysis FDR Controls by Means of a Practical Introduction to Knockoff Filters
- Author
-
Lucas Etourneau, Nelle Varoquaux, and Thomas Burger
- Published
- 2021
19. Computational Tools for the Multiscale Analysis of Hi-C Data in Bacterial Chromosomes
- Author
-
Nelle, Varoquaux, Virginia S, Lioy, Frédéric, Boccard, and Ivan, Junier
- Subjects
Molecular Conformation ,Chromosomes, Bacterial ,Software - Abstract
Just as in eukaryotes, high-throughput chromosome conformation capture (Hi-C) data have revealed nested organizations of bacterial chromosomes into overlapping interaction domains. In this chapter, we present a multiscale analysis framework aiming at capturing and quantifying these properties. These include both standard tools (e.g., contact laws) and novel ones such as an index that allows identifying loci involved in domain formation independently of the structuring scale at play. Our objective is twofold. On the one hand, we aim at providing a full, understandable Python/Jupyter-based code which can be used by both computer scientists and biologists with no advanced computational background. On the other hand, we discuss statistical issues inherent to Hi-C data analysis, focusing more particularly on how to properly assess the statistical significance of results. As a pedagogical example, we analyze data produced in Pseudomonas aeruginosa, a model pathogenetic bacterium. All files (codes and input data) can be found on a GitHub repository. We have also embedded the files into a Binder package so that the full analysis can be run on any machine through Internet.
- Published
- 2021
20. Biosynthesis and Physiology of Ubiquinone under anaerobic conditions
- Author
-
Katayoun Kazemzadeh, Sophie-Carole Chobert, Mahmoud Hajj Chehade, Nelle Varoquaux, John Willison, Ivan Junier, Sophie Abby, Ludovic Pelosi, and Fabien Pierrel
- Subjects
Biophysics ,Cell Biology ,Biochemistry - Published
- 2022
21. Anaerobic ubiquinone biosynthetic pathway in Escherichia coli
- Author
-
Katayoun Kazemzadeh, Sophie Abby, Sophie-Carole Chobert, Mahmoud Hajj Chehade, Nelle Varoquaux, Emmanuel Sechet, Emmanuelle Bouveret, Frédéric Barras, Ludovic Pelosi, and Fabien Pierrel
- Subjects
Biophysics ,Cell Biology ,Biochemistry - Published
- 2022
22. Evolutionary scenario of substrate regio-selectivity for hydroxylases involved in ubiquinone production
- Author
-
Katayoun Kazemzadeh, Clothilde Chenal, Mahmoud Hajj Chehade, William Schmitt, Qiqi He, Manon Jarzynka, Nelle Varoquaux, Ludovic Pelosi, Ivan Junier, Fabien Pierrel, and Sophie Abby
- Subjects
Biophysics ,Cell Biology ,Biochemistry - Published
- 2022
23. Proceedings of the 6th European Conference on Python in Science (EuroSciPy 2013).
- Author
-
Pierre de Buyl and Nelle Varoquaux
- Published
- 2014
24. Proceedings of the 7th European Conference on Python in Science (EuroSciPy 2014).
- Author
-
Pierre de Buyl and Nelle Varoquaux
- Published
- 2014
25. Successional adaptive strategies revealed by correlating arbuscular mycorrhizal fungal abundance with host plant gene expression
- Author
-
Peggy G. Lemaux, Devin Colemann-Derr, Liliam Montoya, Cheng Gao, Ling Xu, Jeffery A. Dahlberg, John W. Taylor, Elizabeth Purdom, Benjamin J. Cole, Pierre-Emmanuel Courty, John P. Vogel, Nelle Varoquaux, Robert B. Hutmacher, University of California [Berkeley], University of California, Agroécologie [Dijon], Université de Bourgogne (UB)-AgroSup Dijon - Institut National Supérieur des Sciences Agronomiques, de l'Alimentation et de l'Environnement-Université Bourgogne Franche-Comté [COMUE] (UBFC)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Translational microbial Evolution and Engineering (TIMC-TrEE), Translational Innovation in Medicine and Complexity / Recherche Translationnelle et Innovation en Médecine et Complexité - UMR 5525 (TIMC ), VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA)-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA), and U.S. Department of Energy [Washington] (DOE)
- Subjects
2. Zero hunger ,Adaptive strategies ,biology ,[SDV]Life Sciences [q-bio] ,fungi ,food and beverages ,Strigolactone ,Ecological succession ,15. Life on land ,Sorghum ,biology.organism_classification ,Genetically modified organism ,Abundance (ecology) ,Botany ,Gene expression ,Genetics ,Ruderal species ,Ecology, Evolution, Behavior and Systematics - Abstract
Arbuscular mycorrhizal fungi (AMF), the mutualistic symbionts with most crops, constitute a research system of human-associated fungi whose relative simplicity and synchrony are conducive to experimental ecology. However, little is known about the shifts in adaptive strategies of sorghum associated AMFs where strong AMF succession replaces initially ruderal species with competitive ones and where the strongest plant response to drought is to manage these AMF. First, we hypothesize that, when irrigation is stopped to mimic drought, competitive AMF species should be replaced by AMF species tolerant to drought stress. We then, for the first time, correlate AMF abundance and host plant transcription to test two novel hypotheses about the mechanisms behind the shift from ruderal to competitive AMF. Surprisingly, despite imposing drought stress, we found no stress tolerant AMF. Remarkably, we found strong and differential correlation between the successional shift from ruderal to competitive AMF and sorghum genes whose products (i) produce and release strigolactone signals, (ii) perceive mycorrhizal-lipochitinoligosaccharide (Myc-LCO) signals, (iii) provide plant lipid and sugar to AMF and, (iv) import minerals and water provided by AMF. These novel insights into host gene expression and succession of AMF show adaptive strategies evolved by AMF and their hosts and provide a rationale for selecting AMF to reduce inputs and maximize yield in commercial agriculture. Future research opportunities include testing the specifics and generality of our hypotheses by employing genetically modified host plants, and exploring additional genes underlying the adaptive strategies in natural succession.
- Published
- 2021
26. Genome-resolved metagenomics reveals role of iron metabolism in drought-induced rhizosphere microbiome dynamics
- Author
-
Peggy G. Lemaux, Siwen Deng, Daniel F. Caddell, Nelle Varoquaux, Mary Lou Guerinot, Dawn Chiniquy, Ling Xu, Heidi M.-L. Wipf, Jeffery A. Dahlberg, Mary Madera, John W. Taylor, Elizabeth Purdom, Spencer Diamond, Cheng Gao, Zhaobin Dong, Robert B. Hutmacher, Devin Coleman-Derr, Tuesday Simmons, Jillian F. Banfield, Grady Pierroz, Adam M. Deutschbauer, University of California [Berkeley], University of California, China State Key Laboratory of Plant Physiology and Biochemistry, China Agricultural University (CAU), Lawrence Berkeley National Laboratory [Berkeley] (LBNL), USDA-ARS : Agricultural Research Service, Translational microbial Evolution and Engineering (TIMC-TrEE), Translational Innovation in Medicine and Complexity / Recherche Translationnelle et Innovation en Médecine et Complexité - UMR 5525 (TIMC ), VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA)-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA), University of California [Davis] (UC Davis), and Dartmouth College [Hanover]
- Subjects
0106 biological sciences ,0301 basic medicine ,Acclimatization ,[SDV]Life Sciences [q-bio] ,General Physics and Astronomy ,01 natural sciences ,Plant Roots ,2.1 Biological and endogenous factors ,RNA-Seq ,Aetiology ,Soil Microbiology ,2. Zero hunger ,Genetics ,Rhizosphere ,Multidisciplinary ,Microbiota ,food and beverages ,Crop Production ,Droughts ,Actinobacteria ,Soil microbiology ,Plant molecular biology ,Science ,Iron ,Physiological ,Drought tolerance ,Biology ,Stress ,General Biochemistry, Genetics and Molecular Biology ,Article ,03 medical and health sciences ,Stress, Physiological ,parasitic diseases ,Microbe ,Microbiome ,Sorghum ,Comparative genomics ,Drought ,Human Genome ,fungi ,Root microbiome ,General Chemistry ,15. Life on land ,biology.organism_classification ,030104 developmental biology ,13. Climate action ,Metagenomics ,Food Security ,010606 plant biology & botany - Abstract
Recent studies have demonstrated that drought leads to dramatic, highly conserved shifts in the root microbiome. At present, the molecular mechanisms underlying these responses remain largely uncharacterized. Here we employ genome-resolved metagenomics and comparative genomics to demonstrate that carbohydrate and secondary metabolite transport functionalities are overrepresented within drought-enriched taxa. These data also reveal that bacterial iron transport and metabolism functionality is highly correlated with drought enrichment. Using time-series root RNA-Seq data, we demonstrate that iron homeostasis within the root is impacted by drought stress, and that loss of a plant phytosiderophore iron transporter impacts microbial community composition, leading to significant increases in the drought-enriched lineage, Actinobacteria. Finally, we show that exogenous application of iron disrupts the drought-induced enrichment of Actinobacteria, as well as their improvement in host phenotype during drought stress. Collectively, our findings implicate iron metabolism in the root microbiome’s response to drought and may inform efforts to improve plant drought tolerance to increase food security., Advances in omics provide a tool to understand mechanisms for plant–microbial interactions under stress. Here the authors apply genome-resolved metagenomics to investigate sorghum and its microbiome responses to drought, identifying an unexpected role of iron metabolism.
- Published
- 2021
27. Dynamics of the compartmentalized Streptomyces chromosome during metabolic differentiation
- Author
-
Nelle Varoquaux, Bertrand Aigle, Sylvie Lautru, Jean-Noël Lorenzi, Corinne Saulnier, Jean-Luc Pernodet, Ivan Junier, Virginia S. Lioy, Yan Jaszczyszyn, Kevin Gorrichon, Hervé Leh, Frédéric Boccard, Thibault Poinsignon, Soumaya Najah, Pierre Leblond, Stéphanie Bury-Moné, Annabelle Thibessard, Olivier Lespinet, Institut de Biologie Intégrative de la Cellule (I2BC), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS), Conformation et Ségrégation du chromosome bactérien (OCB), Département Biologie des Génomes (DBG), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Institut de Biologie Intégrative de la Cellule (I2BC), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS), Microbiologie Moléculaire des Actinomycètes (ACTINO), Département Microbiologie (Dpt Microbio), Dynamique des Génomes et Adaptation Microbienne (DynAMic), Université de Lorraine (UL)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), BioInformatique Moléculaire (BIM), Plateforme de séquençage à haut débit (NGS), Département Plateforme (PF I2BC), Translational Innovation in Medicine and Complexity / Recherche Translationnelle et Innovation en Médecine et Complexité - UMR 5525 (TIMC ), VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA), Translational microbial Evolution and Engineering (TIMC-TrEE), Université Grenoble Alpes (UGA)-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Réarrangements programmés du génome (MICMAC), laboratoire TIMC-IMAG, and Université Grenoble Alpes (COMUE) (UGA)
- Subjects
Genome evolution ,Science ,[SDV]Life Sciences [q-bio] ,General Physics and Astronomy ,Genomics ,Streptomyces ,Genome ,Article ,Chromosomes ,General Biochemistry, Genetics and Molecular Biology ,03 medical and health sciences ,[SDV.BBM.GTP]Life Sciences [q-bio]/Biochemistry, Molecular Biology/Genomics [q-bio.GN] ,Gene expression ,Cellular microbiology ,Gene ,Bacterial genomics ,030304 developmental biology ,Genetics ,0303 health sciences ,Multidisciplinary ,biology ,030306 microbiology ,Dynamics (mechanics) ,Chromosome ,Gene Expression Regulation, Bacterial ,General Chemistry ,Chromosomes, Bacterial ,Compartmentalization (psychology) ,biology.organism_classification ,[SDV.MP.BAC]Life Sciences [q-bio]/Microbiology and Parasitology/Bacteriology ,Anti-Bacterial Agents ,Metabolism ,Chromosome Structures ,Multigene Family ,Transcriptome ,Genome, Bacterial - Abstract
Bacteria of the genus Streptomyces are prolific producers of specialized metabolites, including antibiotics. The linear chromosome includes a central region harboring core genes, as well as extremities enriched in specialized metabolite biosynthetic gene clusters. Here, we show that chromosome structure in Streptomyces ambofaciens correlates with genetic compartmentalization during exponential phase. Conserved, large and highly transcribed genes form boundaries that segment the central part of the chromosome into domains, whereas the terminal ends tend to be transcriptionally quiescent compartments with different structural features. The onset of metabolic differentiation is accompanied by a rearrangement of chromosome architecture, from a rather ‘open’ to a ‘closed’ conformation, in which highly expressed specialized metabolite biosynthetic genes form new boundaries. Thus, our results indicate that the linear chromosome of S. ambofaciens is partitioned into structurally distinct entities, suggesting a link between chromosome folding, gene expression and genome evolution., Streptomyces bacteria have a linear chromosome, with core genes located in the central region and gene clusters for specialized metabolite biosynthesis found in the ‘arms’. Here, Lioy et al. show that such chromosome structure correlates with genetic compartmentalization, and the onset of metabolic differentiation is accompanied by a rearrangement of chromosome architecture.
- Published
- 2021
28. Computational tools for the multiscale analysis of Hi-C data in bacterial chromosomes
- Author
-
Frédéric Boccard, Virginia S. Lioy, Nelle Varoquaux, Ivan Junier, Translational microbial Evolution and Engineering (TIMC-TrEE), Translational Innovation in Medicine and Complexity / Recherche Translationnelle et Innovation en Médecine et Complexité - UMR 5525 (TIMC ), VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA)-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA), Conformation et Ségrégation du chromosome bactérien (OCB), Département Biologie des Génomes (DBG), Institut de Biologie Intégrative de la Cellule (I2BC), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Institut de Biologie Intégrative de la Cellule (I2BC), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS), Institute for Integrative Biology of the Cell [Gif-sur-Yvette] (I2BC), and Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)
- Subjects
Multiscale analysis of Hi-C Data ,0303 health sciences ,business.industry ,Computer science ,Circular bacterial chromosome ,Scale (chemistry) ,[SDV]Life Sciences [q-bio] ,030302 biochemistry & molecular biology ,Python/Jupyter tools ,Python (programming language) ,Nested organization ,computer.software_genre ,Structuring ,Chromosome conformation capture ,03 medical and health sciences ,Index (publishing) ,Code (cryptography) ,The Internet ,Chromosome Interaction Domains ,Data mining ,business ,computer ,030304 developmental biology ,computer.programming_language - Abstract
International audience; Just as in eukaryotes, high-throughput chromosome conformation capture (Hi-C) data have revealed nested organizations of bacterial chromosomes into overlapping interaction domains. In this chapter, we present a multiscale analysis framework aiming at capturing and quantifying these properties. These include both standard tools (e.g. contact laws) and novel ones such as an index that allows identifying loci involved in domain formation independently of the structuring scale at play. Our objective is twofold. On the one hand, we aim at providing a full, understandable Python/Jupyter-based code which can be used by both computer scientists as well as biologists with no advanced computational background. On the other hand, we discuss statistical issues inherent to Hi-C data analysis, focusing more particularly on how to properly assess the statistical significance of results. As a pedagogical example, we analyze data produced in Pseudomonas aeruginosa, a model pathogenetic bacterium. All files (codes and input data) can be found on a github repository. We have also embedded the files into a Binder package so that the full analysis can be run on any machine through internet.
- Published
- 2021
29. Unfolding the Genome: The Case Study of P. falciparum
- Author
-
Nelle Varoquaux and Berkeley Institute for Data Science (BIDS)
- Subjects
Statistics and Probability ,0303 health sciences ,[SDV]Life Sciences [q-bio] ,3d model ,General Medicine ,Computational biology ,Genome ,3. Good health ,Chromatin ,03 medical and health sciences ,0302 clinical medicine ,Statistics, Probability and Uncertainty ,030217 neurology & neurosurgery ,Genome architecture ,030304 developmental biology ,Genomic organization ,Mathematics - Abstract
The development of new ways to probe samples for the three-dimensional (3D) structure of DNA paves the way for in depth and systematic analyses of the genome architecture. 3C-like methods coupled with high-throughput sequencing can now assess physical interactions between pairs of loci in a genome-wide fashion, thus enabling the creation of genome-by-genome contact maps. The spreading of such protocols creates many new opportunities for methodological development: how can we infer 3D models from these contact maps? Can such models help us gain insights into biological processes? Several recent studies applied such protocols to P. falciparum (the deadliest of the five human malaria parasites), assessing its genome organization at different moments of its life cycle. With its small genomic size, fairly simple (yet changing) genomic organization during its lifecyle and strong correlation between chromatin folding and gene expression, this parasite is the ideal case study for applying and developing methods to infer 3D models and use them for downstream analysis. Here, I review a set of methods used to build and analyse three-dimensional models from contact maps data with a special highlight on P. falciparum’s genome organization.
- Published
- 2019
30. iced: fast and memory efficient normalization of contact maps
- Author
-
Nicolas Servant, Nelle Varoquaux, Berkeley Institute for Data Science (BIDS), Department of Statistics [Berkeley], University of California [Berkeley], University of California-University of California, Institut Curie [Paris], Institut National de la Santé et de la Recherche Médicale (INSERM), MINES ParisTech - École nationale supérieure des mines de Paris, and Université Paris sciences et lettres (PSL)
- Subjects
0106 biological sciences ,Normalization (statistics) ,0303 health sciences ,Computer science ,business.industry ,Pattern recognition ,Python (programming language) ,01 natural sciences ,03 medical and health sciences ,[INFO]Computer Science [cs] ,Artificial intelligence ,business ,computer ,ComputingMilieux_MISCELLANEOUS ,030304 developmental biology ,010606 plant biology & botany ,computer.programming_language - Abstract
International audience
- Published
- 2019
31. Resistance to Adoption of Best Practices
- Author
-
Dan Sholler, Sara Stoudt, Chris J. Kennedy, Fernando Hoces de la Guardia, Francois Lanusse, Karthik Ram, Kellie Ottoboni, Marla Stuart, Maryam Vareth, Nelle Varoquaux, Rebecca Barter, R. Stuart Geiger, Scott Peterson, and Stefan van der Walt
- Subjects
bepress|Social and Behavioral Sciences|Sociology|Work, Economy and Organizations ,bepress|Social and Behavioral Sciences|Sociology ,SocArXiv|Social and Behavioral Sciences|Sociology ,SocArXiv|Social and Behavioral Sciences|Sociology|Organizations, Occupations, and Work ,SocArXiv|Social and Behavioral Sciences|Library and Information Science ,bepress|Social and Behavioral Sciences ,SocArXiv|Social and Behavioral Sciences ,bepress|Social and Behavioral Sciences|Science and Technology Studies ,SocArXiv|Social and Behavioral Sciences|Science and Technology Studies ,bepress|Social and Behavioral Sciences|Library and Information Science - Abstract
There are many recommendations of "best practices" for those doing data science, data-intensive research, and research in general. These documents usually present a particular vision of how people should work with data and computing, recommending specific tools, activities, mechanisms, and sensibilities. However, implementation of best (or better) practices in any setting is often met with resistance from individuals and groups, who perceive some drawbacks to the proposed changes to everyday practice. We offer some definitions of resistance, identify the sources of researchers' hesitancy to adopt new ways of working, and describe some of the ways resistance is manifested in data science teams. We then offer strategies for overcoming resistance based on our group members' experiences working alongside resistors or resisting change themselves. Our discussion concluded with many remaining questions left to tackle, some of which are listed at the end of this piece.
- Published
- 2019
32. Inferring Diploid 3D Chromatin Structures from Hi-C Data
- Author
-
Alexandra Gesine Cauer and Gürkan Yardımcı and Jean-Philippe Vert and Nelle Varoquaux and William Stafford Noble, Cauer, Alexandra Gesine, Yardımcı, Gürkan, Vert, Jean-Philippe, Varoquaux, Nelle, Noble, William Stafford, Alexandra Gesine Cauer and Gürkan Yardımcı and Jean-Philippe Vert and Nelle Varoquaux and William Stafford Noble, Cauer, Alexandra Gesine, Yardımcı, Gürkan, Vert, Jean-Philippe, Varoquaux, Nelle, and Noble, William Stafford
- Abstract
The 3D organization of the genome plays a key role in many cellular processes, such as gene regulation, differentiation, and replication. Assays like Hi-C measure DNA-DNA contacts in a high-throughput fashion, and inferring accurate 3D models of chromosomes can yield insights hidden in the raw data. For example, structural inference can account for noise in the data, disambiguate the distinct structures of homologous chromosomes, orient genomic regions relative to nuclear landmarks, and serve as a framework for integrating other data types. Although many methods exist to infer the 3D structure of haploid genomes, inferring a diploid structure from Hi-C data is still an open problem. Indeed, the diploid case is very challenging, because Hi-C data typically does not distinguish between homologous chromosomes. We propose a method to infer 3D diploid genomes from Hi-C data. We demonstrate the accuracy of the method on simulated data, and we also use the method to infer 3D structures for mouse chromosome X, confirming that the active homolog exhibits a bipartite structure, whereas the active homolog does not.
- Published
- 2019
- Full Text
- View/download PDF
33. Changes in genome organization of parasite-specific gene families during the Plasmodium transmission stages
- Author
-
Jacques Prudhomme, Ferhat Ay, Anthony Cort, Gayani Batugedara, Chiara Andolina, Evelien M. Bunnik, William Stafford Noble, David A. Fidock, Jean-Philippe Vert, Photini Sinnis, Kate B. Cook, Rita Tewari, Nelle Varoquaux, François Nosten, Lirong Shi, Leila S. Ross, Karine G. Le Roch, Declan Brady, Department of Statistics [Berkeley], University of California [Berkeley], University of California-University of California, Centre de Bioinformatique (CBIO), MINES ParisTech - École nationale supérieure des mines de Paris, Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL), Cancer et génome: Bioinformatique, biostatistiques et épidémiologie d'un système complexe, Institut Curie [Paris]-MINES ParisTech - École nationale supérieure des mines de Paris, Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de la Santé et de la Recherche Médicale (INSERM), Nuffield (Nuffield), University of Oxford [Oxford], Département de Mathématiques et Applications - ENS Paris (DMA), École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS), Department of Genome Sciences [Seattle] (GS), University of Washington [Seattle], Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut Curie [Paris]-Institut National de la Santé et de la Recherche Médicale (INSERM), and Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS Paris)
- Subjects
0301 basic medicine ,Erythrocytes ,Chromosomal Proteins, Non-Histone ,Protozoan Proteins ,General Physics and Astronomy ,Genome ,2.1 Biological and endogenous factors ,2.2 Factors relating to the physical environment ,Malaria, Falciparum ,Aetiology ,lcsh:Science ,ComputingMilieux_MISCELLANEOUS ,Genomic organization ,Regulation of gene expression ,Genetics ,Multidisciplinary ,biology ,[SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM] ,3. Good health ,Chromosomal Proteins ,Infectious Diseases ,Multigene Family ,Protozoan ,Female ,Infection ,Biotechnology ,Falciparum ,Science ,Plasmodium falciparum ,Article ,General Biochemistry, Genetics and Molecular Biology ,03 medical and health sciences ,Rare Diseases ,parasitic diseases ,Anopheles ,Gene family ,Animals ,Humans ,Gene ,Life Cycle Stages ,Human Genome ,General Chemistry ,Non-Histone ,biology.organism_classification ,Subtelomeric heterochromatin ,Malaria ,Vector-Borne Diseases ,030104 developmental biology ,Good Health and Well Being ,Chromobox Protein Homolog 5 ,Heterochromatin protein 1 ,lcsh:Q ,Genome, Protozoan - Abstract
The development of malaria parasites throughout their various life cycle stages is coordinated by changes in gene expression. We previously showed that the three-dimensional organization of the Plasmodium falciparum genome is strongly associated with gene expression during its replication cycle inside red blood cells. Here, we analyze genome organization in the P. falciparum and P. vivax transmission stages. Major changes occur in the localization and interactions of genes involved in pathogenesis and immune evasion, host cell invasion, sexual differentiation, and master regulation of gene expression. Furthermore, we observe reorganization of subtelomeric heterochromatin around genes involved in host cell remodeling. Depletion of heterochromatin protein 1 (PfHP1) resulted in loss of interactions between virulence genes, confirming that PfHP1 is essential for maintenance of the repressive center. Our results suggest that the three-dimensional genome structure of human malaria parasites is strongly connected with transcriptional activity of specific gene families throughout the life cycle., The development of malaria parasites is controlled by coordinated changes in gene expression. Here, the authors show that the three-dimensional genome structure of human malaria parasites is strongly connected with transcriptional activity of specific gene families throughout the life cycles of Plasmodium falciparum and Plasmodium vivax parasites.
- Published
- 2018
34. Changes in genome organization of parasite-specific gene families during thePlasmodiumtransmission stages
- Author
-
Ferhat Ay, Leila S. Ross, Chiara Andolina, William Stafford Noble, Declan Brady, Kate B. Cook, Evelien M. Bunnik, Nelle Varoquaux, Gayani Batugedara, Rita Tewari, François Nosten, Photini Sinnis, Jean-Philippe Vert, Karine G. Le Roch, Lirong Shi, Jacques Prudhomme, and David A. Fidock
- Subjects
Regulation of gene expression ,Genetics ,0303 health sciences ,Liver cell ,030302 biochemistry & molecular biology ,Biology ,Genome ,3. Good health ,Subtelomeric heterochromatin ,03 medical and health sciences ,Gene expression ,Gene family ,Heterochromatin protein 1 ,Gene ,030304 developmental biology - Abstract
The development of malaria parasites throughout their various life cycle stages is controlled by coordinated changes in gene expression. We previously showed that the three-dimensional organization of theP. falciparumgenome is strongly associated with gene expression during its replication cycle inside red blood cells. Here, we analyzed genome organization in theP. falciparumandP. vivaxtransmission stages. Major changes occurred in the localization and interactions of genes involved in pathogenesis and immune evasion, erythrocyte and liver cell invasion, sexual differentiation and master regulation of gene expression. In addition, we observed reorganization of subtelomeric heterochromatin around genes involved in host cell remodeling. Depletion of heterochromatin protein 1 (PfHP1) resulted in loss of interactions between virulence genes, confirming that PfHP1 is essential for maintenance of the repressive center. Overall, our results suggest that the three-dimensional genome structure is strongly connected with transcriptional activity of specific gene families throughout the life cycle of human malaria parasites.
- Published
- 2018
35. The Types, Roles, and Practices of Documentation in Data Analytics Open Source Software Libraries: A Collaborative Ethnography of Documentation Work
- Author
-
Chris Holdgraf, Nelle Varoquaux, Charlotte Mazel-Cabasse, and R. Stuart Geiger
- Subjects
FOS: Computer and information sciences ,Software documentation ,General Computer Science ,Computer science ,020207 software engineering ,02 engineering and technology ,Data science ,Peer production ,Code (semiotics) ,Variety (cybernetics) ,Task (project management) ,Software Engineering (cs.SE) ,Computer Science - Computers and Society ,Computer Science - Software Engineering ,Documentation ,Work (electrical) ,020204 information systems ,Computer-supported cooperative work ,Computers and Society (cs.CY) ,0202 electrical engineering, electronic engineering, information engineering - Abstract
Computational research and data analytics increasingly relies on complex ecosystems of open source software (OSS) "libraries" -- curated collections of reusable code that programmers import to perform a specific task. Software documentation for these libraries is crucial in helping programmers/analysts know what libraries are available and how to use them. Yet documentation for open source software libraries is widely considered low-quality. This article is a collaboration between CSCW researchers and contributors to data analytics OSS libraries, based on ethnographic fieldwork and qualitative interviews. We examine several issues around the formats, practices, and challenges around documentation in these largely volunteer-based projects. There are many different kinds and formats of documentation that exist around such libraries, which play a variety of educational, promotional, and organizational roles. The work behind documentation is similarly multifaceted, including writing, reviewing, maintaining, and organizing documentation. Different aspects of documentation work require contributors to have different sets of skills and overcome various social and technical barriers. Finally, most of our interviewees do not report high levels of intrinsic enjoyment for doing documentation work (compared to writing code). Their motivation is affected by personal and project-specific factors, such as the perceived level of credit for doing documentation work versus more "technical" tasks like adding new features or fixing bugs. In studying documentation work for data analytics OSS libraries, we gain a new window into the changing practices of data-intensive research, as well as help practitioners better understand how to support this often invisible and infrastructural work in their projects.
- Published
- 2018
- Full Text
- View/download PDF
36. Effective normalization for copy number variation in Hi-C data
- Author
-
Jean-Philippe Vert, Edith Heard, Nicolas Servant, Nelle Varoquaux, Emmanuel Barillot, Bodescot, Myriam, Institut Curie [Paris], Cancer et génome: Bioinformatique, biostatistiques et épidémiologie d'un système complexe, Mines Paris - PSL (École nationale supérieure des mines de Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut Curie [Paris]-Institut National de la Santé et de la Recherche Médicale (INSERM), Centre de Bioinformatique (CBIO), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL), Department of Statistics [Berkeley], University of California [Berkeley] (UC Berkeley), University of California (UC)-University of California (UC), Berkeley Institute for Data Science (BIDS), Génétique et Biologie du Développement, Institut Curie [Paris]-Institut National de la Santé et de la Recherche Médicale (INSERM)-Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS), Département de Mathématiques et Applications - ENS Paris (DMA), École normale supérieure - Paris (ENS-PSL), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS), This work was supported by the Labex Deep, the Ligue Contre le Cancer, the European Research Coucil (SMAC-ERC-280032), the ERC Advanced Investigator award (ERC-250367), the ABS4NGS project (ANR-11-BINF-0001), the Gordon and Betty Moore Foundation (Grant GBMF3834) and the Alfred P. Sloan Foundation (Grant 2013-10-27)., Epigenèse et développement des mammifères, Université Pierre et Marie Curie - Paris 6 (UPMC)-Institut Curie [Paris]-Institut National de la Santé et de la Recherche Médicale (INSERM)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre et Marie Curie - Paris 6 (UPMC)-Institut Curie [Paris]-Institut National de la Santé et de la Recherche Médicale (INSERM)-Centre National de la Recherche Scientifique (CNRS), Collège de France - Chaire Epigénétique et mémoire cellulaire, Collège de France (CdF (institution)), Institut Curie [Paris]-MINES ParisTech - École nationale supérieure des mines de Paris, Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de la Santé et de la Recherche Médicale (INSERM), MINES ParisTech - École nationale supérieure des mines de Paris, University of California [Berkeley], University of California-University of California, École normale supérieure - Paris (ENS Paris), Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS Paris), Centre National de la Recherche Scientifique (CNRS)-Institut Curie [Paris]-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université Pierre et Marie Curie - Paris 6 (UPMC)-Centre National de la Recherche Scientifique (CNRS)-Institut Curie [Paris]-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université Pierre et Marie Curie - Paris 6 (UPMC), Chaire Epigénétique et mémoire cellulaire, Cancer et génôme: Bioinformatique, biostatistiques et épidémiologie d'un système complexe, MINES ParisTech - École nationale supérieure des mines de Paris-Institut Curie-Institut National de la Santé et de la Recherche Médicale (INSERM), MINES ParisTech - École nationale supérieure des mines de Paris-PSL Research University (PSL), Institut Curie-Institut National de la Santé et de la Recherche Médicale (INSERM)-Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS), and École normale supérieure - Paris (ENS Paris)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
0301 basic medicine ,Normalization (statistics) ,Copy-number ,DNA Copy Number Variations ,Computer science ,Sequencing data ,[SDV.CAN]Life Sciences [q-bio]/Cancer ,Biology ,lcsh:Computer applications to medicine. Medical informatics ,Genome ,Biochemistry ,Database normalization ,Chromosome conformation capture ,03 medical and health sciences ,0302 clinical medicine ,[SDV.CAN] Life Sciences [q-bio]/Cancer ,Structural Biology ,Hi-C ,Neoplasms ,Humans ,Copy-number variation ,lcsh:QH301-705.5 ,Molecular Biology ,ComputingMilieux_MISCELLANEOUS ,030304 developmental biology ,Cancer ,Chromosome Aberrations ,0303 health sciences ,Diploid genome ,[SDV.BIBS] Life Sciences [q-bio]/Quantitative Methods [q-bio.QM] ,Genome, Human ,Applied Mathematics ,Methodology Article ,Chromosome Mapping ,Computational Biology ,Genomics ,Simple extension ,[SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM] ,Computer Science Applications ,Data set ,Normalization ,030104 developmental biology ,lcsh:Biology (General) ,030220 oncology & carcinogenesis ,lcsh:R858-859.7 ,Algorithm - Abstract
Background Normalization is essential to ensure accurate analysis and proper interpretation of sequencing data, and chromosome conformation capture data such as Hi-C have particular challenges. Although several methods have been proposed, the most widely used type of normalization of Hi-C data usually casts estimation of unwanted effects as a matrix balancing problem, relying on the assumption that all genomic regions interact equally with each other. Results In order to explore the effect of copy-number variations on Hi-C data normalization, we first propose a simulation model that predict the effects of large copy-number changes on a diploid Hi-C contact map. We then show that the standard approaches relying on equal visibility fail to correct for unwanted effects in the presence of copy-number variations. We thus propose a simple extension to matrix balancing methods that model these effects. Our approach can either retain the copy-number variation effects (LOIC) or remove them (CAIC). We show that this leads to better downstream analysis of the three-dimensional organization of rearranged genomes. Conclusions Taken together, our results highlight the importance of using dedicated methods for the analysis of Hi-C cancer data. Both CAIC and LOIC methods perform well on simulated and real Hi-C data sets, each fulfilling different needs. Electronic supplementary material The online version of this article (10.1186/s12859-018-2256-5) contains supplementary material, which is available to authorized users.
- Published
- 2018
37. Accurate identification of centromere locations in yeast genomes using Hi-C
- Author
-
Jay Shendure, Ferhat Ay, Nelle Varoquaux, Jean-Philippe Vert, William Stafford Noble, Ivan Liachko, Maitreya J. Dunham, Joshua N. Burton, Centre de Bioinformatique (CBIO), MINES ParisTech - École nationale supérieure des mines de Paris, Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL), Cancer et génome: Bioinformatique, biostatistiques et épidémiologie d'un système complexe, Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut Curie [Paris]-Institut National de la Santé et de la Recherche Médicale (INSERM), Department of Genome Sciences [Seattle] (GS), University of Washington [Seattle], and Department of Computer Science and Engineering [Seattle]
- Subjects
Centromere ,Plasmodium falciparum ,Saccharomyces cerevisiae ,Sequence assembly ,Genomics ,Computational biology ,Genome ,Deep sequencing ,Chromosome conformation capture ,Yeasts ,Genetics ,ComputingMilieux_MISCELLANEOUS ,biology ,Chromosome Mapping ,Computational Biology ,DNA Restriction Enzymes ,biology.organism_classification ,[SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM] ,3. Good health ,Metagenomics ,Genome, Fungal ,Software - Abstract
Centromeres are essential for proper chromosome segregation. Despite extensive research, centromere locations in yeast genomes remain difficult to infer, and in most species they are still unknown. Recently, the chromatin conformation capture assay, Hi-C, has been re-purposed for diverse applications, including de novo genome assembly, deconvolution of metagenomic samples and inference of centromere locations. We describe a method, Centurion, that jointly infers the locations of all centromeres in a single genome from Hi-C data by exploiting the centromeres’ tendency to cluster in three-dimensional space. We first demonstrate the accuracy of Centurion in identifying known centromere locations from high coverage Hi-C data of budding yeast and a human malaria parasite. We then use Centurion to infer centromere locations in 14 yeast species. Across all microbes that we consider, Centurion predicts 89% of centromeres within 5 kb of their known locations. We also demonstrate the robustness of the approach in datasets with low sequencing depth. Finally, we predict centromere coordinates for six yeast species that currently lack centromere annotations. These results show that Centurion can be used for centromere identification for diverse species of yeast and possibly other microorganisms.
- Published
- 2015
38. Multiple dimensions of epigenetic gene regulation in the malaria parasitePlasmodium falciparum
- Author
-
William Stafford Noble, Jean-Philippe Vert, Nelle Varoquaux, Karine G. Le Roch, Ferhat Ay, and Evelien M. Bunnik
- Subjects
Plasmodium falciparum ,Article ,General Biochemistry, Genetics and Molecular Biology ,Epigenesis, Genetic ,Histones ,03 medical and health sciences ,0302 clinical medicine ,parasitic diseases ,medicine ,Nucleosome ,Epigenetics ,Gene ,030304 developmental biology ,Genetics ,Regulation of gene expression ,0303 health sciences ,Virulence ,biology ,Epigenome ,biology.organism_classification ,medicine.disease ,Malaria ,Nucleosomes ,3. Good health ,Histone ,biology.protein ,030217 neurology & neurosurgery - Abstract
Plasmodium falciparum is the most deadly human malarial parasite, responsible for an estimated 207 million cases of disease and 627,000 deaths in 2012. Recent studies reveal that the parasite actively regulates a large fraction of its genes throughout its replicative cycle inside human red blood cells and that epigenetics plays an important role in this precise gene regulation. Here, we discuss recent advances in our understanding of three aspects of epigenetic regulation in P. falciparum: changes in histone modifications, nucleosome occupancy and the three-dimensional genome structure. We compare these three aspects of the P. falciparum epigenome to those of other eukaryotes, and show that large-scale compartmentalization is particularly important in determining histone decomposition and gene regulation in P. falciparum. We conclude by presenting a gene regulation model for P. falciparum that combines the described epigenetic factors, and by discussing the implications of this model for the future of malaria research.
- Published
- 2014
39. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing
- Author
-
Emmanuel Barillot, Jean-Philippe Vert, Edith Heard, Eric Viara, Bryan R. Lajoie, Nelle Varoquaux, Chong-Jian Chen, Nicolas Servant, Job Dekker, Cancer et génome: Bioinformatique, biostatistiques et épidémiologie d'un système complexe, MINES ParisTech - École nationale supérieure des mines de Paris, Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut Curie [Paris]-Institut National de la Santé et de la Recherche Médicale (INSERM), Service de Bioinformatique (CURIE-BIOINFO), Institut Curie [Paris], Centre de Bioinformatique (CBIO), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL), Institut National de la Santé et de la Recherche Médicale (INSERM), University of Massachusetts Medical School [Worcester] (UMASS), University of Massachusetts System (UMASS), Sysra, Génétique et Biologie du Développement, Centre National de la Recherche Scientifique (CNRS)-Institut Curie [Paris]-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université Pierre et Marie Curie - Paris 6 (UPMC), Chaire Epigénétique et mémoire cellulaire, Collège de France (CdF (institution)), Mines Paris - PSL (École nationale supérieure des mines de Paris), Université Pierre et Marie Curie - Paris 6 (UPMC)-Institut Curie [Paris]-Institut National de la Santé et de la Recherche Médicale (INSERM)-Centre National de la Recherche Scientifique (CNRS), and Collège de France - Chaire Epigénétique et mémoire cellulaire
- Subjects
Genetics ,Normalization (statistics) ,Data processing ,Correction method ,Source code ,business.industry ,Chromosome conformation ,media_common.quotation_subject ,Genomics ,Bioinformatics pipeline ,Biology ,[SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM] ,Chromosomes ,Cell Line ,Computational science ,Normalization ,Software ,Hi-C ,Data format ,Humans ,business ,Algorithms ,Alleles ,media_common - Abstract
HiC-Pro is an optimized and flexible pipeline for processing Hi-C data from raw reads to normalized contact maps. HiC-Pro maps reads, detects valid ligation products, performs quality controls and generates intra- and inter-chromosomal contact maps. It includes a fast implementation of the iterative correction method and is based on a memory-efficient data format for Hi-C contact maps. In addition, HiC-Pro can use phased genotype data to build allele-specific contact maps. We applied HiC-Pro to different Hi-C datasets, demonstrating its ability to easily process large data in a reasonable time. Source code and documentation are available at http://github.com/nservant/HiC-Pro. Electronic supplementary material The online version of this article (doi:10.1186/s13059-015-0831-x) contains supplementary material, which is available to authorized users.
- Published
- 2015
40. Three-dimensional modeling of the P. falciparum genome during the erythrocytic cycle reveals a strong connection between genome architecture and gene expression
- Author
-
Karine G. Le Roch, William Stafford Noble, Evelien M. Bunnik, Sebastiaan Bol, Jean-Philippe Vert, Jacques Prudhomme, Ferhat Ay, Nelle Varoquaux, Department of Genome Sciences [Seattle] (GS), University of Washington [Seattle], Department of Cell Biology and Neuroscience [Riverside] (CBNS), University of California [Riverside] (UCR), University of California-University of California, Cancer et génome: Bioinformatique, biostatistiques et épidémiologie d'un système complexe, MINES ParisTech - École nationale supérieure des mines de Paris, Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut Curie [Paris]-Institut National de la Santé et de la Recherche Médicale (INSERM), Centre de Bioinformatique (CBIO), and Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)
- Subjects
Genome evolution ,Bioinformatics ,Plasmodium falciparum ,Schizonts ,Biology ,Genome ,Medical and Health Sciences ,Chromosomes ,Chromosome conformation capture ,Rare Diseases ,Genetic ,Models ,Genetics ,2.2 Factors relating to the physical environment ,Developmental ,Trophozoites ,Aetiology ,Gene ,Genetics (clinical) ,ComputingMilieux_MISCELLANEOUS ,Genomic organization ,Regulation of gene expression ,Models, Genetic ,Research ,Human Genome ,Gene Expression Regulation, Developmental ,Biological Sciences ,Subtelomere ,Chromatin Assembly and Disassembly ,[SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM] ,3. Good health ,Chromatin ,Malaria ,Vector-Borne Diseases ,Infectious Diseases ,Good Health and Well Being ,Gene Expression Regulation ,Protozoan ,Infection ,Genome, Protozoan ,Biotechnology - Abstract
The development of the human malaria parasite Plasmodium falciparum is controlled by coordinated changes in gene expression throughout its complex life cycle, but the corresponding regulatory mechanisms are incompletely understood. To study the relationship between genome architecture and gene regulation in Plasmodium, we assayed the genome architecture of P. falciparum at three time points during its erythrocytic (asexual) cycle. Using chromosome conformation capture coupled with next-generation sequencing technology (Hi-C), we obtained high-resolution chromosomal contact maps, which we then used to construct a consensus three-dimensional genome structure for each time point. We observed strong clustering of centromeres, telomeres, ribosomal DNA, and virulence genes, resulting in a complex architecture that cannot be explained by a simple volume exclusion model. Internal virulence gene clusters exhibit domain-like structures in contact maps, suggesting that they play an important role in the genome architecture. Midway during the erythrocytic cycle, at the highly transcriptionally active trophozoite stage, the genome adopts a more open chromatin structure with increased chromosomal intermingling. In addition, we observed reduced expression of genes located in spatial proximity to the repressive subtelomeric center, and colocalization of distinct groups of parasite-specific genes with coordinated expression profiles. Overall, our results are indicative of a strong association between the P. falciparum spatial genome organization and gene expression. Understanding the molecular processes involved in genome conformation dynamics could contribute to the discovery of novel antimalarial strategies.
- Published
- 2014
41. A statistical approach for inferring the three-dimensional structure of the genome
- Author
-
Nelle varoquaux, Ferhat Ay, William Noble, Jean-Philippe Vert, Centre de Bioinformatique (CBIO), Mines Paris - PSL (École nationale supérieure des mines de Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL), Cancer et génome: Bioinformatique, biostatistiques et épidémiologie d'un système complexe, Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut Curie [Paris]-Institut National de la Santé et de la Recherche Médicale (INSERM), Department of Genome Sciences [Seattle] (GS), University of Washington [Seattle], Department of Computer Science and Engineering, Vert, Jean-Philippe, and MINES ParisTech - École nationale supérieure des mines de Paris
- Subjects
[SDV.BIBS] Life Sciences [q-bio]/Quantitative Methods [q-bio.QM] ,[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM] ,[SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM] ,ComputingMilieux_MISCELLANEOUS ,[INFO.INFO-BI] Computer Science [cs]/Bioinformatics [q-bio.QM] - Abstract
Recent technological advances allow the measurement, in a single Hi-C experiment, of the frequencies of physical contacts among pairs of genomic loci at a genome-wide scale. The next challenge is to infer, from the resulting DNA-DNA contact maps, accurate three dimensional models of how chromosomes fold and fit into the nucleus. Many existing inference methods rely upon \emph{multidimensional scaling} (MDS), in which the pairwise distances of the inferred model are optimized to resemble pairwise distances derived directly from the contact counts. These approaches, however, often optimize a heuristic objective function and require strong assumptions about the biophysics of DNA to transform interaction frequencies to spatial distance, thereby leading to incorrect structure reconstruction. We propose a novel approach to infer a consensus three-dimensional structure of a genome from Hi-C data. The method incorporates a statistical model of the contact counts, assuming that the counts between two loci follow a Poisson distribution whose intensity decreases with the physical distances between the loci. The method can automatically adjust the transfer function relating the spatial distance to the Poisson intensity and infer a genome structure that best explains the observed data. We compare two variants of our Poisson method, with or without optimization of the transfer function, to four different MDS-based algorithms---two metric MDS methods using different stress functions, a nonmetric version of MDS, and ChromSDE, a recently described, advanced MDS method---on a wide range of simulated datasets. We demonstrate that the Poisson models reconstruct better structures than all MDS-based methods, particularly at low coverage and high resolution, and we highlight the importance of optimizing the transfer function. On publicly available Hi-C data from mouse embryonic stem cells, we show that the Poisson methods lead to more reproducible structures than MDS-based methods when we use data generated using different restriction enzymes, and when we reconstruct structures at different resolutions.
- Published
- 2014
42. MarkUs: An Open-Source Web Application to Annotate Student Papers On-Line
- Author
-
Guillaume Moreau, Karen Reid, Mike Conley, Morgan Magnin, Severin Gehwolf, Benjamin Vialle, and Nelle Varoquaux
- Subjects
World Wide Web ,Open source ,business.industry ,Computer science ,ComputingMilieux_COMPUTERSANDEDUCATION ,Mathematics education ,Web application ,Rubric ,business ,Grading (education) - Abstract
A critical component of the learning process lies in the feedback that students receive on their work that validates their progress, identifies flaws in their thinking, and identifies skills that still need to be learned. Many higher-education institutions have developed an active pedagogy that gives students opportunities for different forms of assessment and feedback. This means that students have numerous lab exercises, assignments, and projects. Both instructors and students thus require effective tools to efficiently manage the submission, assessment, and individualized feedback of students’ work. The open-source web application MarkUs aims at meeting these needs: it facilitates the submission and assessment of students’ work. Students directly submit their work using MarkUs, rather than printing it, or sending it by email. The instructors or teaching assistants use MarkUs’s interface to view the students’ work, annotate it, and fill in a marking rubric. Students use the same interface to read the annotations and learn from the assessment. Managing the students’ submissions and the instructors assessments within a single online system, has led to several positive pedagogical outcomes: the number of late submissions has decreased, the assessment time has been drastically reduced, students can access their results and read the instructor’s feedback immediately after the grading process is completed. Using MarkUs has also significantly reduced the time that instructors spend collecting assignments, creating the marking schemes, passing them on to graders, handling special cases, and returning work to the students. In this paper, we introduce MarkUs’ features, and illustrate their benefits for higher education through our own teaching experiences and that of our colleagues. We also describe an important benefit of the fact that the tool itself is open-source. MarkUs has been developed entirely by students giving them a valuable learning opportunity as they work on a large software system that real users depend on. Virtuous circles indeed arise, with former users of MarkUs becoming developers and then supervisors of further development. We will conclude by drawing perspectives about forthcoming features and use, both technically and pedagogically.
- Published
- 2012
43. Proceedings of the 7th European Conference on Python in Science (EuroSciPy 2014)
- Author
-
Pierre de Buyl, Nelle varoquaux, Université libre de Bruxelles (ULB), Centre de Bioinformatique (CBIO), MINES ParisTech - École nationale supérieure des mines de Paris, and Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)
- Subjects
[INFO]Computer Science [cs] ,ComputingMilieux_MISCELLANEOUS - Abstract
International audience
44. Identifying multi-locus chromatin contacts in human cells using tethered multiple 3C
- Author
-
William Stafford Noble, Andrew R. Hoffman, Nelle Varoquaux, Ferhat Ay, Jean-Philippe Vert, Jan E. Carette, Thanh H. Vu, Michael J. Zeitz, MINES ParisTech, Bibliothèque, Department of Genome Sciences [Seattle] (GS), University of Washington [Seattle], Palo Alto Veterans Affairs Health care System, Stanford University, Centre de Bioinformatique (CBIO), Mines Paris - PSL (École nationale supérieure des mines de Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL), Institut National de la Santé et de la Recherche Médicale (INSERM), Institut Curie [Paris], Department of Microbiology and Immunology [Stanford], Stanford Medicine, Stanford University-Stanford University, Department of Computer Science and Engineering [Seattle], and MINES ParisTech - École nationale supérieure des mines de Paris
- Subjects
Histone-modifying enzymes ,Restriction Mapping ,Biology ,Near-haploid human cells ,Genome ,Chromosome conformation capture ,03 medical and health sciences ,0302 clinical medicine ,Cell Line, Tumor ,Genome architecture ,Genetics ,Humans ,Scaffold/matrix attachment region ,ChIA-PET ,030304 developmental biology ,0303 health sciences ,Chromatin conformation capture ,Leukemia ,[SDV.BIBS] Life Sciences [q-bio]/Quantitative Methods [q-bio.QM] ,Genome, Human ,[SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM] ,Chromatin ,ChIP-sequencing ,Multi-locus chromatin contacts ,Human genome ,Three-dimensional modeling ,030217 neurology & neurosurgery ,Research Article ,Biotechnology - Abstract
Background Several recently developed experimental methods, each an extension of the chromatin conformation capture (3C) assay, have enabled the genome-wide profiling of chromatin contacts between pairs of genomic loci in 3D. Especially in complex eukaryotes, data generated by these methods, coupled with other genome-wide datasets, demonstrated that non-random chromatin folding correlates strongly with cellular processes such as gene expression and DNA replication. Results We describe a genome architecture assay, tethered multiple 3C (TM3C), that maps genome-wide chromatin contacts via a simple protocol of restriction enzyme digestion and religation of fragments upon agarose gel beads followed by paired-end sequencing. In addition to identifying contacts between pairs of loci, TM3C enables identification of contacts among more than two loci simultaneously. We use TM3C to assay the genome architectures of two human cell lines: KBM7, a near-haploid chronic leukemia cell line, and NHEK, a normal diploid human epidermal keratinocyte cell line. We confirm that the contact frequency maps produced by TM3C exhibit features characteristic of existing genome architecture datasets, including the expected scaling of contact probabilities with genomic distance, megabase scale chromosomal compartments and sub-megabase scale topological domains. We also confirm that TM3C captures several known cell type-specific contacts, ploidy shifts and translocations, such as Philadelphia chromosome formation (Ph+) in KBM7. We confirm a subset of the triple contacts involving the IGF2-H19 imprinting control region (ICR) using PCR analysis for KBM7 cells. Our genome-wide analysis of pairwise and triple contacts demonstrates their preference for linking open chromatin regions to each other and for linking regions with higher numbers of DNase hypersensitive sites (DHSs) to each other. For near-haploid KBM7 cells, we infer whole genome 3D models that exhibit clustering of small chromosomes with each other and large chromosomes with each other, consistent with previous studies of the genome architectures of other human cell lines. Conclusion TM3C is a simple protocol for ascertaining genome architecture and can be used to identify simultaneous contacts among three or four loci. Application of TM3C to a near-haploid human cell line revealed large-scale features of chromosomal organization and multi-way chromatin contacts that preferentially link regions of open chromatin. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1236-7) contains supplementary material, which is available to authorized users.
- Full Text
- View/download PDF
45. SPySort: Neuronal Spike Sorting with Python
- Author
-
Pouzat, Christophe, Detorakis, Georgios Is., Mathématiques Appliquées Paris 5 (MAP5 - UMR 8145), Université Paris Descartes - Paris 5 (UPD5)-Institut National des Sciences Mathématiques et de leurs Interactions (INSMI)-Centre National de la Recherche Scientifique (CNRS), Laboratoire des signaux et systèmes (L2S), Centre National de la Recherche Scientifique (CNRS)-CentraleSupélec-Université Paris-Sud - Paris 11 (UP11), Pierre de Buyl, Nelle Varoquaux, ANR-13-JS03-0006,SYNCHNEURO,Théorie de la commande pour la synchronisation neuronale: modélisation à partir de données l'optogénétique, et alteration boucle-fermée des oscillations cérébrales pathologiques(2013), Université Paris-Sud - Paris 11 (UP11)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS), Mathématiques Appliquées à Paris 5 ( MAP5 - UMR 8145 ), Université Paris Descartes - Paris 5 ( UPD5 ) -Institut National des Sciences Mathématiques et de leurs Interactions-Centre National de la Recherche Scientifique ( CNRS ), Laboratoire des signaux et systèmes ( L2S ), Université Paris-Sud - Paris 11 ( UP11 ) -CentraleSupélec-Centre National de la Recherche Scientifique ( CNRS ), and ANR-13-JS03-0006,SYNCHNEURO,Théorie de la commande pour la synchronisation neuronale: modélisation à partir de données l'optogénétique, et alteration boucle-fermée des oscillations cérébrales pathologiques ( 2013 )
- Subjects
sampling theorem ,FOS: Computer and information sciences ,sampling jitter correction ,[STAT.AP]Statistics [stat]/Applications [stat.AP] ,[SDV.NEU.NB]Life Sciences [q-bio]/Neurons and Cognition [q-bio.NC]/Neurobiology ,[ STAT.AP ] Statistics [stat]/Applications [stat.AP] ,Computational Engineering, Finance, and Science (cs.CE) ,Gaussian Mixture Model ,kmeans ,dimen-sion reduction ,Quantitative Biology - Neurons and Cognition ,[ SDV.NEU.NB ] Life Sciences [q-bio]/Neurons and Cognition [q-bio.NC]/Neurobiology ,FOS: Biological sciences ,Neurons and Cognition (q-bio.NC) ,E-M algorithm ,Computer Science - Computational Engineering, Finance, and Science ,Index Terms—clustering - Abstract
Extracellular recordings with multi-electrode arrays is one of the basic tools of contemporary neuroscience. These recordings are mostly used to monitor the activities, understood as sequences of emitted action potentials, of many individual neurons. But the raw data produced by extracellular recordings are most commonly a mixture of activities from several neurons. In order to get the activities of the individual contributing neurons, a pre-processing step called spike sorting is required. We present here a pure Python implementation of a well tested spike sorting procedure. The latter was designed in a modular way in order to favour a smooth transition from an interactive sorting, for instance with IPython, to an automatic one. Surprisingly enough - or sadly enough, depending on one's view point -, recoding our now 15 years old procedure into Python was the occasion of major methodological improvements., Comment: Part of the Proceedings of the 7th European Conference on Python in Science (EuroSciPy 2014), Pierre de Buyl and Nelle Varoquaux editors, (2014)
- Published
- 2014
- Full Text
- View/download PDF
46. Computing an Optimal Control Policy for an Energy Storage
- Author
-
Haessig, Pierre, Kovaltchouk, Thibaut, Multon, Bernard, Ahmed, Hamid Ben, Lascaud, Stéphane, Systèmes et Applications des Technologies de l'Information et de l'Energie (SATIE), École normale supérieure - Cachan (ENS Cachan)-Université Paris-Sud - Paris 11 (UP11)-Institut Français des Sciences et Technologies des Transports, de l'Aménagement et des Réseaux (IFSTTAR)-École normale supérieure - Rennes (ENS Rennes)-Université de Cergy Pontoise (UCP), Université Paris-Seine-Université Paris-Seine-Conservatoire National des Arts et Métiers [CNAM] (CNAM)-Centre National de la Recherche Scientifique (CNRS), EDF R&D (EDF R&D), EDF (EDF), and Pierre de Buyl and Nelle Varoquaux
- Subjects
Power Smoothing ,Stochastic Dynamic Programming ,Policy Iteration Algorithm ,Ocean Wave Energy ,FOS: Electrical engineering, electronic engineering, information engineering ,[INFO.INFO-SY]Computer Science [cs]/Systems and Control [cs.SY] ,Computer Science - Systems and Control ,Autoregressive Models ,Systems and Control (eess.SY) - Abstract
We introduce StoDynProg, a small library created to solve Optimal Control problems arising in the management of Renewable Power Sources, in particular when coupled with an Energy Storage System. The library implements generic Stochastic Dynamic Programming (SDP) numerical methods which can solve a large class of Dynamic Optimization problems. We demonstrate the library capabilities with a prototype problem: smoothing the power of an Ocean Wave Energy Converter. First we use time series analysis to derive a stochastic Markovian model of this system since it is required by Dynamic Programming. Then, we briefly describe the "policy iteration" algorithm we have implemented and the numerical tools being used. We show how the API design of the library is generic enough to address Dynamic Optimization problems outside the field of Energy Management. Finally, we solve the power smoothing problem and compare the optimal control with a simpler heuristic control., Comment: Part of the Proceedings of the 6th European Conference on Python in Science (EuroSciPy 2013), Pierre de Buyl and Nelle Varoquaux editors, (2014)
- Published
- 2013
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.