44 results on '"Wild, David L"'
Search Results
2. Exploiting molecular dynamics in Nested Sampling simulations of small peptides
- Author
-
Burkoff, Nikolas S., Baldock, Robert J.N., Várnai, Csilla, Wild, David L., and Csányi, Gábor
- Published
- 2016
- Full Text
- View/download PDF
3. Time-Series Transcriptomics Reveals That AGAMOUS-LIKE22 Affects Primary Metabolism and Developmental Processes in Drought-Stressed Arabidopsis
- Author
-
Bechtold, Ulrike, Penfold, Christopher A., Jenkins, Dafyd J., Legaie, Roxane, Moore, Jonathan D., Lawson, Tracy, Matthews, Jack S.A., Vialet-Chabrand, Silvere R.M., Baxter, Laura, Subramaniam, Sunitha, Hickman, Richard, Florance, Hannah, Sambles, Christine, Salmon, Deborah L., Feil, Regina, Bowden, Laura, Hill, Claire, Baker, Neil R., Lunn, John E., Finkenstädt, Bärbel, Mead, Andrew, Buchanan-Wollaston, Vicky, Beynon, Jim, Rand, David A., Wild, David L., Denby, Katherine J., Ott, Sascha, Smirnoff, Nicholas, and Mullineaux, Philip M.
- Published
- 2016
4. Transcriptional Dynamics Driving MAMP-Triggered Immunity and Pathogen Effector-Mediated Immunosuppression in Arabidopsis Leaves Following Infection with Pseudomonas syringae pv tomato DC3000
- Author
-
Lewis, Laura A., Polanski, Krzysztof, de Torres-Zabala, Marta, Jayaraman, Siddharth, Bowden, Laura, Moore, Jonathan, Penfold, Christopher A., Jenkins, Dafyd J., Hill, Claire, Baxter, Laura, Kulasekaran, Satish, Truman, William, Littlejohn, George, Prusinska, Justyna, Mead, Andrew, Steinbrenner, Jens, Hickman, Richard, Rand, David, Wild, David L., Ott, Sascha, Buchanan-Wollaston, Vicky, Smirnoff, Nick, Beynon, Jim, Denby, Katherine, and Grant, Murray
- Published
- 2015
5. Arabidopsis Defense against Botrytis cinerea: Chronology and Regulation Deciphered by High-Resolution Temporal Transcriptomic Analysis
- Author
-
Windram, Oliver, Madhou, Priyadharshini, McHattie, Stuart, Hill, Claire, Hickman, Richard, Cooke, Emma, Jenkins, Dafyd J., Penfold, Christopher A., Baxter, Laura, Breeze, Emily, Kiddle, Steven J., Rhodes, Johanna, Atwell, Susanna, Kliebenstein, Daniel J., Kim, Youn-sung, Stegle, Oliver, Borgwardt, Karsten, Zhang, Cunjin, Tabrett, Alex, Legaie, Roxane, Moore, Jonathan, Finkenstadt, Bärbel, Wild, David L., Mead, Andrew, Rand, David, Beynon, Jim, Ott, Sascha, Buchanan-Wollaston, Vicky, and Denby, Katherine J.
- Published
- 2012
6. High-Resolution Temporal Profiling of Transcripts during Arabidopsis Leaf Senescence Reveals a Distinct Chronology of Processes and Regulation
- Author
-
Breeze, Emily, Harrison, Elizabeth, McHattie, Stuart, Hughes, Linda, Hickman, Richard, Hill, Claire, Kiddle, Steven, Kim, Youn-sung, Penfold, Christopher A., Jenkins, Dafyd, Zhang, Cunjin, Morris, Karl, Jenner, Carol, Jackson, Stephen, Thomas, Brian, Tabrett, Alexandra, Legaie, Roxane, Moore, Jonathan D., Wild, David L., Ott, Sascha, Rand, David, Beynon, Jim, Denby, Katherine, Mead, Andrew, and Buchanan-Wollaston, Vicky
- Published
- 2011
7. An approach to pathway reconstruction using whole genome metabolic models and sensitive sequence searching
- Author
-
Saqi Mansoor, Dobson Richard Jb., Kraben Preben, Hodgson David A., and Wild David L.
- Subjects
Biotechnology ,TP248.13-248.65 - Abstract
Metabolic models have the potential to impact on genome annotation and on the interpretation of gene expression and other high throughput genome data. The genome of Streptomyces coelicolor genome has been sequenced and some 30% of the open reading frames (ORFs) lack any functional annotation. A recently constructed metabolic network model for S. coelicolor highlights biochemical functions which should exist to make the metabolic model complete and consistent. These include 205 reactions for which no ORF is associated. Here we combine protein functional predictions for the unannotated open reading frames in the genome with ‘missing but expected’ functions inferred from the metabolic model. The approach allows function predictions to be evaluated in the context of the biochemical pathway reconstruction, and feed back iteratively into the metabolic model. We describe the approach and discuss a few illustrative examples.
- Published
- 2009
- Full Text
- View/download PDF
8. Expectations from Structural Genomics Revisited: An Analysis of Structural Genomics Targets
- Author
-
Saqi, Mansoor A. S. and Wild, David L.
- Published
- 2005
- Full Text
- View/download PDF
9. KMD: An Open-Source Port of the ArrayExpress Microarray Database
- Author
-
Mainguy, Jean-Pierre, MacDonnell, Grant, Bund, Stefan, and Wild, David L.
- Published
- 2004
- Full Text
- View/download PDF
10. A local regulatory network around three NAC transcription factors in stress responses and senescence in Arabidopsis leaves
- Author
-
Hickman, Richard, Hill, Claire, Penfold, Christopher A., Breeze, Emily, Bowden, Laura, Moore, Jonathan D., Zhang, Peijun, Jackson, Alison, Cooke, Emma, Bewicke-Copley, Findlay, Mead, Andrew, Beynon, Jim, Wild, David L., Denby, Katherine J., Ott, Sascha, and Buchanan-Wollaston, Vicky
- Published
- 2013
- Full Text
- View/download PDF
11. Predicting protein β-sheet contacts using a maximum entropy-based correlated mutation measure
- Author
-
Burkoff, Nikolas S., Várnai, Csilla, and Wild, David L.
- Published
- 2013
- Full Text
- View/download PDF
12. Bayesian correlated clustering to integrate multiple datasets
- Author
-
Kirk, Paul, Griffin, Jim E., Savage, Richard S., Ghahramani, Zoubin, and Wild, David L.
- Published
- 2012
- Full Text
- View/download PDF
13. Nonparametric Bayesian inference for perturbed and orthologous gene regulatory networks
- Author
-
Penfold, Christopher A., Buchanan-Wollaston, Vicky, Denby, Katherine J., and Wild, David L.
- Published
- 2012
- Full Text
- View/download PDF
14. Biomarker discovery in microarray gene expression data with Gaussian processes
- Author
-
Chu, Wei, Ghahramani, Zoubin, Falciani, Francesco, and Wild, David L.
- Published
- 2005
15. A Bayesian approach to reconstructing genetic regulatory networks with hidden factors
- Author
-
Beal, Matthew J., Falciani, Francesco, Ghahramani, Zoubin, Rangel, Claudia, and Wild, David L.
- Published
- 2005
16. Time‐series transcriptomics reveals a BBX32‐directed control of acclimation to high light in mature Arabidopsis leaves.
- Author
-
Alvarez‐Fernandez, Ruben, Penfold, Christopher A., Galvez‐Valdivieso, Gregorio, Exposito‐Rodriguez, Marino, Stallard, Ellie J., Bowden, Laura, Moore, Jonathan D., Mead, Andrew, Davey, Phillip A., Matthews, Jack S. A., Beynon, Jim, Buchanan‐Wollaston, Vicky, Wild, David L., Lawson, Tracy, Bechtold, Ulrike, Denby, Katherine J., and Mullineaux, Philip M.
- Subjects
PHOTORECEPTORS ,ACCLIMATIZATION ,GENE regulatory networks ,ARABIDOPSIS - Abstract
SUMMARY: The photosynthetic capacity of mature leaves increases after several days' exposure to constant or intermittent episodes of high light (HL) and is manifested primarily as changes in chloroplast physiology. How this chloroplast‐level acclimation to HL is initiated and controlled is unknown. From expanded Arabidopsis leaves, we determined HL‐dependent changes in transcript abundance of 3844 genes in a 0–6 h time‐series transcriptomics experiment. It was hypothesized that among such genes were those that contribute to the initiation of HL acclimation. By focusing on differentially expressed transcription (co‐)factor genes and applying dynamic statistical modelling to the temporal transcriptomics data, a regulatory network of 47 predominantly photoreceptor‐regulated transcription (co‐)factor genes was inferred. The most connected gene in this network was B‐BOX DOMAIN CONTAINING PROTEIN32 (BBX32). Plants overexpressing BBX32 were strongly impaired in acclimation to HL and displayed perturbed expression of photosynthesis‐associated genes under LL and after exposure to HL. These observations led to demonstrating that as well as regulation of chloroplast‐level acclimation by BBX32, CRYPTOCHROME1, LONG HYPOCOTYL5, CONSTITUTIVELY PHOTOMORPHOGENIC1 and SUPPRESSOR OF PHYA‐105 are important. In addition, the BBX32‐centric gene regulatory network provides a view of the transcriptional control of acclimation in mature leaves distinct from other photoreceptor‐regulated processes, such as seedling photomorphogenesis. Significance Statement: Identifying genes that control plants' intrinsic photosynthetic capacity would provide opportunities for increasing crop productivity. Photosynthetic capacity in mature leaves increases after several days' exposure to episodes of high‐light intensity (HL), which is defined as HL acclimation. Using temporal transcriptomics data and dynamic modelling, we inferred a BBX32‐centric 47‐member Gene Regulatory Network, members of which control, in the first hours of HL exposure, cellular processes that result in enhanced photosynthetic capacity 5 days later. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
17. Modeling T-cell activation using gene expression profiling and state-space models
- Author
-
Rangel, Claudia, Angus, John, Ghahramani, Zoubin, Lioumi, Maria, Sotheran, Elizabeth, Gaiba, Alessia, Wild, David L., and Falciani, Francesco
- Published
- 2004
18. Bayesian hierarchical clustering for microarray time series data with replicates and outlier measurements
- Author
-
Darkins Robert, Kirk Paul DW, Savage Richard S, Cooke Emma J, and Wild David L
- Subjects
Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background Post-genomic molecular biology has resulted in an explosion of data, providing measurements for large numbers of genes, proteins and metabolites. Time series experiments have become increasingly common, necessitating the development of novel analysis tools that capture the resulting data structure. Outlier measurements at one or more time points present a significant challenge, while potentially valuable replicate information is often ignored by existing techniques. Results We present a generative model-based Bayesian hierarchical clustering algorithm for microarray time series that employs Gaussian process regression to capture the structure of the data. By using a mixture model likelihood, our method permits a small proportion of the data to be modelled as outlier measurements, and adopts an empirical Bayes approach which uses replicate observations to inform a prior distribution of the noise variance. The method automatically learns the optimum number of clusters and can incorporate non-uniformly sampled time points. Using a wide variety of experimental data sets, we show that our algorithm consistently yields higher quality and more biologically meaningful clusters than current state-of-the-art methodologies. We highlight the importance of modelling outlier values by demonstrating that noisy genes can be grouped with other genes of similar biological function. We demonstrate the importance of including replicate information, which we find enables the discrimination of additional distinct expression profiles. Conclusions By incorporating outlier measurements and replicate values, this clustering algorithm for time series microarray data provides a step towards a better treatment of the noise inherent in measurements from high-throughput genomic technologies. Timeseries BHC is available as part of the R package 'BHC' (version 1.5), which is available for download from Bioconductor (version 2.9 and above) via http://www.bioconductor.org/packages/release/bioc/html/BHC.html?pagewanted=all.
- Published
- 2011
- Full Text
- View/download PDF
19. The dynamic architecture of the metabolic switch in Streptomyces coelicolor
- Author
-
Bonin Michael, Wild David L, Rand David A, Dijkhuizen Lubbert, Jansen Ritsert C, Challis Gregory L, Legaie Roxane, Gaze William H, Iqbal Mudassar, Thomas Louise, Nentwich Merle, Rodríguez-García Antonio, Juarez-Hermosillo Miguel A, Morrissey Edward R, Omara Walid AM, Moore Jonathan, Merlo Maria E, Alam Mohammad T, Sletta Håvard, Jakobsen Øyvind M, Wentzel Alexander, Bruheim Per, Herbig Alexander, Battke Florian, Nieselt Kay, Reuther Jens, Wohlleben Wolfgang, Smith Margaret CM, Burroughs Nigel J, Martín Juan F, Hodgson David A, Takano Eriko, Breitling Rainer, Ellingsen Trond E, and Wellington Elizabeth MH
- Subjects
Biotechnology ,TP248.13-248.65 ,Genetics ,QH426-470 - Abstract
Abstract Background During the lifetime of a fermenter culture, the soil bacterium S. coelicolor undergoes a major metabolic switch from exponential growth to antibiotic production. We have studied gene expression patterns during this switch, using a specifically designed Affymetrix genechip and a high-resolution time-series of fermenter-grown samples. Results Surprisingly, we find that the metabolic switch actually consists of multiple finely orchestrated switching events. Strongly coherent clusters of genes show drastic changes in gene expression already many hours before the classically defined transition phase where the switch from primary to secondary metabolism was expected. The main switch in gene expression takes only 2 hours, and changes in antibiotic biosynthesis genes are delayed relative to the metabolic rearrangements. Furthermore, global variation in morphogenesis genes indicates an involvement of cell differentiation pathways in the decision phase leading up to the commitment to antibiotic biosynthesis. Conclusions Our study provides the first detailed insights into the complex sequence of early regulatory events during and preceding the major metabolic switch in S. coelicolor, which will form the starting point for future attempts at engineering antibiotic production in a biotechnological setting.
- Published
- 2010
- Full Text
- View/download PDF
20. R/BHC: fast Bayesian hierarchical clustering for microarray data
- Author
-
Grant Murray, Truman William M, Ghahramani Zoubin, Xu Yang, Heller Katherine, Savage Richard S, Denby Katherine J, and Wild David L
- Subjects
Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background Although the use of clustering methods has rapidly become one of the standard computational approaches in the literature of microarray gene expression data analysis, little attention has been paid to uncertainty in the results obtained. Results We present an R/Bioconductor port of a fast novel algorithm for Bayesian agglomerative hierarchical clustering and demonstrate its use in clustering gene expression microarray data. The method performs bottom-up hierarchical clustering, using a Dirichlet Process (infinite mixture) to model uncertainty in the data and Bayesian model selection to decide at each step which clusters to merge. Conclusion Biologically plausible results are presented from a well studied data set: expression profiles of A. thaliana subjected to a variety of biotic and abiotic stresses. Our method avoids several limitations of traditional methods, for example how many clusters there should be and how to choose a principled distance metric.
- Published
- 2009
- Full Text
- View/download PDF
21. Improving protein-protein interaction prediction using evolutionary information from low-quality MSAs.
- Author
-
Várnai, Csilla, Burkoff, Nikolas S., and Wild, David L.
- Subjects
PROTEIN-protein interactions ,INTERMOLECULAR interactions ,ENTROPY ,THERMODYNAMIC state variables ,ORTHOGONAL functions - Abstract
Evolutionary information stored in multiple sequence alignments (MSAs) has been used to identify the interaction interface of protein complexes, by measuring either co-conservation or co-mutation of amino acid residues across the interface. Recently, maximum entropy related correlated mutation measures (CMMs) such as direct information, decoupling direct from indirect interactions, have been developed to identify residue pairs interacting across the protein complex interface. These studies have focussed on carefully selected protein complexes with large, good-quality MSAs. In this work, we study protein complexes with a more typical MSA consisting of fewer than 400 sequences, using a set of 79 intramolecular protein complexes. Using a maximum entropy based CMM at the residue level, we develop an interface level CMM score to be used in re-ranking docking decoys. We demonstrate that our interface level CMM score compares favourably to the complementarity trace score, an evolutionary information-based score measuring co-conservation, when combined with the number of interface residues, a knowledge-based potential and the variability score of individual amino acid sites. We also demonstrate, that, since co-mutation and co-complementarity in the MSA contain orthogonal information, the best prediction performance using evolutionary information can be achieved by combining the co-mutation information of the CMM with co-conservation information of a complementarity trace score, predicting a near-native structure as the top prediction for 41% of the dataset. The method presented is not restricted to small MSAs, and will likely improve interface prediction also for complexes with large and good-quality MSAs. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
22. MDI-GPU: accelerating integrative modelling for genomic-scale data using GP-GPU computing.
- Author
-
Mason, Samuel A., Sayyid, Faiz, Kirk, Paul D.W., Starr, Colin, and Wild, David L.
- Subjects
GRAPHICS processing units ,SYSTEMS biology ,GENOMICS ,BAYESIAN analysis ,CLUSTER analysis (Statistics) ,MATHEMATICAL models - Abstract
The integration of multi-dimensional datasets remains a key challenge in systems biology and genomic medicine. Modern high-throughput technologies generate a broad array of different data types, providing distinct - but often complementary - information. However, the large amount of data adds burden to any inference task. Flexible Bayesian methods may reduce the necessity for strong modelling assumptions, but can also increase the computational burden. We present an improved implementation of a Bayesian correlated clustering algorithm, that permits integrated clustering to be routinely performed across multiple datasets, each with tens of thousands of items. By exploiting GPU based computation, we are able to improve runtime performance of the algorithm by almost four orders of magnitude. This permits analysis across genomic-scale data sets, greatly expanding the range of applications over those originally possible. MDI is available here: . [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
23. Bringing numerous methods for expression and promoter analysis to a public cloud computing service.
- Author
-
Polański, Krzysztof, Gao, Bo, Mason, Sam A, Brown, Paul, Ott, Sascha, Denby, Katherine J, and Wild, David L
- Subjects
ALGORITHMS ,CLOUD computing ,GENE expression ,DATA analysis ,HYPERGEOMETRIC functions - Abstract
Summary: Every year, a large number of novel algorithms are introduced to the scientific community for a myriad of applications, but using these across different research groups is often troublesome, due to suboptimal implementations and specific dependency requirements. This does not have to be the case, as public cloud computing services can easily house tractable implementations within self-contained dependency environments, making the methods easily accessible to a wider public. We have taken 14 popular methods, the majority related to expression data or promoter analysis, developed these up to a good implementation standard and housed the tools in isolated Docker containers which we integrated into the CyVerse Discovery Environment, making these easily usable for a wide community as part of the CyVerse UK project. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
24. CSI: a nonparametric Bayesian approach to network inference from multiple perturbed time series gene expression data.
- Author
-
Penfold, Christopher A., Shifaz, Ahmed, Brown, Paul E., Nicholson, Ann, and Wild, David L.
- Subjects
TIME series analysis ,MATHEMATICAL statistics ,PROBABILITY theory ,GENE expression ,MOLECULAR genetics - Abstract
Here we introduce the causal structure identification (CSI) package, a Gaussian process based approach to inferring gene regulatory networks (GRNs) from multiple time series data. The standard CSI approach infers a single GRN via joint learning from multiple time series datasets; the hierarchical approach (HCSI) infers a separate GRN for each dataset, albeit with the networks constrained to favor similar structures, allowing for the identification of context specific networks. The software is implemented in MATLAB and includes a graphical user interface (GUI) for user friendly inference. Finally the GUI can be connected to high performance computer clusters to facilitate analysis of large genomic datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
25. Accelerating Bayesian Hierarchical Clustering of Time Series Data with a Randomised Algorithm.
- Author
-
Darkins, Robert, Cooke, Emma J., Ghahramani, Zoubin, Kirk, Paul D. W., Wild, David L., and Savage, Richard S.
- Subjects
BAYESIAN analysis ,TIME series analysis ,ALGORITHMS ,DEVELOPMENTAL biology ,GENE expression ,PROTEIN microarrays ,PROBABILITY theory - Abstract
We live in an era of abundant data. This has necessitated the development of new and innovative statistical algorithms to get the most from experimental data. For example, faster algorithms make practical the analysis of larger genomic data sets, allowing us to extend the utility of cutting-edge statistical methods. We present a randomised algorithm that accelerates the clustering of time series data using the Bayesian Hierarchical Clustering (BHC) statistical method. BHC is a general method for clustering any discretely sampled time series data. In this paper we focus on a particular application to microarray gene expression data. We define and analyse the randomised algorithm, before presenting results on both synthetic and real biological data sets. We show that the randomised algorithm leads to substantial gains in speed with minimal loss in clustering quality. The randomised time series BHC algorithm is available as part of the R package BHC, which is available for download from Bioconductor (version 2.10 and above) via http://bioconductor.org/packages/2.10/bioc/html/BHC.html. We have also made available a set of R scripts which can be used to reproduce the analyses carried out in this paper. These are available from the following URL. https://sites.google.com/site/randomisedbhc/. [ABSTRACT FROM AUTHOR]
- Published
- 2013
- Full Text
- View/download PDF
26. Modeling and Visualizing Uncertainty in Gene Expression Clusters Using Dirichlet Process Mixtures.
- Author
-
Rasmussen, Carl Edward, de la Cruz, Bernard J., Ghahramani, Zoubin, and Wild, David L.
- Published
- 2009
- Full Text
- View/download PDF
27. CRANKITE: A fast polypeptide backbone conformation sampler.
- Author
-
Podtelezhnikov, Alexei A. and Wild, David L.
- Subjects
- *
SOFTWARE samplers , *MONTE Carlo method , *POLYPEPTIDES , *PROTEIN conformation , *SIMULATION methods & models - Abstract
Background: CRANKITE is a suite of programs for simulating backbone conformations of polypeptides and proteins. The core of the suite is an efficient Metropolis Monte Carlo sampler of backbone conformations in continuous three-dimensional space in atomic details. Methods: In contrast to other programs relying on local Metropolis moves in the space of dihedral angles, our sampler utilizes local crankshaft rotations of rigid peptide bonds in Cartesian space. Results: The sampler allows fast simulation and analysis of secondary structure formation and conformational changes for proteins of average length. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
28. Learning about protein hydrogen bonding by minimizing contrastive divergence.
- Author
-
Podtelezhnikov, Alexei A., Ghahramani, Zoubin, and Wild, David L.
- Abstract
Defining the strength and geometry of hydrogen bonds in protein structures has been a challenging task since early days of structural biology. In this article, we apply a novel statistical machine learning technique, known as contrastive divergence, to efficiently estimate both the hydrogen bond strength and the geometric characteristics of strong interpeptide backbone hydrogen bonds, from a dataset of structures representing a variety of different protein folds. Despite the simplifying assumptions of the interatomic energy terms used, we determine the strength of these hydrogen bonds to be between 1.1 and 1.5 kcal/mol, in good agreement with earlier experimental estimates. The geometry of these strong backbone hydrogen bonds features an almost linear arrangement of all four atoms involved in hydrogen bond formation. We estimate that about a quarter of all hydrogen bond donors and acceptors participate in these strong interpeptide hydrogen bonds. Proteins 2007. © 2006 Wiley-Liss, Inc. [ABSTRACT FROM AUTHOR]
- Published
- 2007
- Full Text
- View/download PDF
29. Exhaustive Metropolis Monte Carlo sampling and analysis of polyalanine conformations adopted under the influence of hydrogen bonds.
- Author
-
Podtelezhnikov, Alexei A. and Wild, David L.
- Abstract
We propose a novel Metropolis Monte Carlo procedure for protein modeling and analyze the influence of hydrogen bonding on the distribution of polyalanine conformations. We use an atomistic model of the polyalanine chain with rigid and planar polypeptide bonds, and elastic α carbon valence geometry. We adopt a simplified energy function in which only hard-sphere repulsion and hydrogen bonding interactions between the atoms are considered. Our Metropolis Monte Carlo procedure utilizes local crankshaft moves and is combined with parallel tempering to exhaustively sample the conformations of 16-mer polyalanine. We confirm that Flory's isolated-pair hypothesis (the steric independence between the dihedral angles of individual amino acids) does not hold true in long polypeptide chains. In addition to 3
10 - and α-helices, we identify a kink stabilized by 2 hydrogen bonds with a shared acceptor as a common structural motif. Varying the strength of hydrogen bonds, we induce the helix-coil transition in the model polypeptide chain. We compare the propensities for various hydrogen bonding patterns and determine the degree of cooperativity of hydrogen bond formation in terms of the Hill coefficient. The observed helix-coil transition is also quantified according to Zimm-Bragg theory. Proteins 2005. © 2005 Wiley-Liss, Inc. [ABSTRACT FROM AUTHOR]- Published
- 2005
- Full Text
- View/download PDF
30. Comment on “Efficient Monte Carlo trial moves for polypeptide simulations” [J. Chem. Phys. 123, 174905 (2005)].
- Author
-
Podtelezhnikov, Alexei A. and Wild, David L.
- Subjects
- *
SYMMETRY (Physics) , *ROTATIONAL motion , *VECTOR analysis , *PLANE geometry , *MONTE Carlo method , *POLYPEPTIDES - Abstract
The article presents a study which demonstrates the invariance during the crankshaft or pivotal rotations and point out the main error in Betamcourt's Monte Carlo polypeptide sampling procedure. Unit vectors u and v are introduced that define orientation bonds N-Cα and Cα-C, respectively, and introduced unit vectors n' and n normal to peptide bond planes adjacent to the same alpha-carbon Cα. It is concluded that Betamcourt's unnecessary correction has resulted error.
- Published
- 2008
- Full Text
- View/download PDF
31. Discovering transcriptional modules by Bayesian data integration.
- Author
-
Savage, Richard S., Ghahramani, Zoubin, Griffin, Jim E., De la Cruz, Bernard J., and Wild, David L.
- Subjects
COMPUTERS in medicine ,BAYESIAN analysis ,GENE expression ,TRANSCRIPTION factors ,GENETIC transcription - Abstract
Motivation: We present a method for directly inferring transcriptional modules (TMs) by integrating gene expression and transcription factor binding (ChIP-chip) data. Our model extends a hierarchical Dirichlet process mixture model to allow data fusion on a gene-by-gene basis. This encodes the intuition that co-expression and co-regulation are not necessarily equivalent and hence we do not expect all genes to group similarly in both datasets. In particular, it allows us to identify the subset of genes that share the same structure of transcriptional modules in both datasets. [ABSTRACT FROM PUBLISHER]
- Published
- 2010
- Full Text
- View/download PDF
32. Computational approaches to the integration of gene expression, ChIP-chip and sequence data in the inference of gene regulatory networks
- Author
-
Cooke, Emma J., Savage, Richard S., and Wild, David L.
- Subjects
- *
GENE expression , *GENETIC regulation , *TRANSCRIPTION factors , *DEVELOPMENTAL biology , *CHROMATIN , *SYSTEMS biology , *NUCLEOTIDE sequence - Abstract
Abstract: A major challenge in systems biology is the ability to model complex regulatory interactions, such as gene regulatory networks, and a number of computational approaches have been developed over recent years to address this challenge. This paper reviews a number of these approaches, with a focus on probabilistic graphical models and the integration of diverse data sets, such as gene expression and transcription factor binding site location and activity. [Copyright &y& Elsevier]
- Published
- 2009
- Full Text
- View/download PDF
33. Inferring Gene Regulatory Networks from Multiple Datasets.
- Author
-
Penfold CA, Gherman I, Sybirna A, and Wild DL
- Subjects
- Algorithms, Bayes Theorem, Gene Expression Profiling instrumentation, Gene Expression Profiling methods, Normal Distribution, Spatio-Temporal Analysis, Systems Biology instrumentation, Datasets as Topic, Gene Regulatory Networks, Models, Genetic, Systems Biology methods
- Abstract
Gaussian process dynamical systems (GPDS) represent Bayesian nonparametric approaches to inference of nonlinear dynamical systems, and provide a principled framework for the learning of biological networks from multiple perturbed time series measurements of gene or protein expression. Such approaches are able to capture the full richness of complex ODE models, and can be scaled for inference in moderately large systems containing hundreds of genes. Related hierarchical approaches allow for inference from multiple datasets in which the underlying generative networks are assumed to have been rewired, either by context-dependent changes in network structure, evolutionary processes, or synthetic manipulation. These approaches can also be used to leverage experimentally determined network structures from one species into another where the network structure is unknown. Collectively, these methods provide a comprehensive and flexible platform for inference from a diverse range of data, with applications in systems and synthetic biology, as well as spatiotemporal modelling of embryo development. In this chapter we provide an overview of GPDS approaches and highlight their applications in the biological sciences, with accompanying tutorials available as a Jupyter notebook from https://github.com/cap76/GPDS .
- Published
- 2019
- Full Text
- View/download PDF
34. Inferring orthologous gene regulatory networks using interspecies data fusion.
- Author
-
Penfold CA, Millar JB, and Wild DL
- Subjects
- Algorithms, Bayes Theorem, Cell Cycle genetics, Computer Simulation, Models, Genetic, Saccharomyces cerevisiae genetics, Schizosaccharomyces genetics, Software, Gene Expression Profiling, Gene Regulatory Networks
- Abstract
Motivation: The ability to jointly learn gene regulatory networks (GRNs) in, or leverage GRNs between related species would allow the vast amount of legacy data obtained in model organisms to inform the GRNs of more complex, or economically or medically relevant counterparts. Examples include transferring information from Arabidopsis thaliana into related crop species for food security purposes, or from mice into humans for medical applications. Here we develop two related Bayesian approaches to network inference that allow GRNs to be jointly inferred in, or leveraged between, several related species: in one framework, network information is directly propagated between species; in the second hierarchical approach, network information is propagated via an unobserved 'hypernetwork'. In both frameworks, information about network similarity is captured via graph kernels, with the networks additionally informed by species-specific time series gene expression data, when available, using Gaussian processes to model the dynamics of gene expression., Results: Results on in silico benchmarks demonstrate that joint inference, and leveraging of known networks between species, offers better accuracy than standalone inference. The direct propagation of network information via the non-hierarchical framework is more appropriate when there are relatively few species, while the hierarchical approach is better suited when there are many species. Both methods are robust to small amounts of mislabelling of orthologues. Finally, the use of Saccharomyces cerevisiae data and networks to inform inference of networks in the budding yeast Schizosaccharomyces pombe predicts a novel role in cell cycle regulation for Gas1 (SPAC19B12.02c), a 1,3-beta-glucanosyltransferase., Availability and Implementation: MATLAB code is available from http://go.warwick.ac.uk/systemsbiology/software/., (© The Author 2015. Published by Oxford University Press.)
- Published
- 2015
- Full Text
- View/download PDF
35. Efficient Parameter Estimation of Generalizable Coarse-Grained Protein Force Fields Using Contrastive Divergence: A Maximum Likelihood Approach.
- Author
-
Várnai C, Burkoff NS, and Wild DL
- Abstract
Maximum Likelihood (ML) optimization schemes are widely used for parameter inference. They maximize the likelihood of some experimentally observed data, with respect to the model parameters iteratively, following the gradient of the logarithm of the likelihood. Here, we employ a ML inference scheme to infer a generalizable, physics-based coarse-grained protein model (which includes Go̅-like biasing terms to stabilize secondary structure elements in room-temperature simulations), using native conformations of a training set of proteins as the observed data. Contrastive divergence, a novel statistical machine learning technique, is used to efficiently approximate the direction of the gradient ascent, which enables the use of a large training set of proteins. Unlike previous work, the generalizability of the protein model allows the folding of peptides and a protein (protein G) which are not part of the training set. We compare the same force field with different van der Waals (vdW) potential forms: a hard cutoff model, and a Lennard-Jones (LJ) potential with vdW parameters inferred or adopted from the CHARMM or AMBER force fields. Simulations of peptides and protein G show that the LJ model with inferred parameters outperforms the hard cutoff potential, which is consistent with previous observations. Simulations using the LJ potential with inferred vdW parameters also outperforms the protein models with adopted vdW parameter values, demonstrating that model parameters generally cannot be used with force fields with different energy functions. The software is available at https://sites.google.com/site/crankite/.
- Published
- 2013
- Full Text
- View/download PDF
36. Exploring the energy landscapes of protein folding simulations with Bayesian computation.
- Author
-
Burkoff NS, Várnai C, Wells SA, and Wild DL
- Subjects
- Bacterial Proteins chemistry, Bayes Theorem, Peptides chemistry, Protein Structure, Secondary, Thermodynamics, Models, Molecular, Protein Folding
- Abstract
Nested sampling is a Bayesian sampling technique developed to explore probability distributions localized in an exponentially small area of the parameter space. The algorithm provides both posterior samples and an estimate of the evidence (marginal likelihood) of the model. The nested sampling algorithm also provides an efficient way to calculate free energies and the expectation value of thermodynamic observables at any temperature, through a simple post processing of the output. Previous applications of the algorithm have yielded large efficiency gains over other sampling techniques, including parallel tempering. In this article, we describe a parallel implementation of the nested sampling algorithm and its application to the problem of protein folding in a Gō-like force field of empirical potentials that were designed to stabilize secondary structure elements in room-temperature simulations. We demonstrate the method by conducting folding simulations on a number of small proteins that are commonly used for testing protein-folding procedures. A topological analysis of the posterior samples is performed to produce energy landscape charts, which give a high-level description of the potential energy surface for the protein folding simulations. These charts provide qualitative insights into both the folding process and the nature of the model and force field used., (Copyright © 2012 Biophysical Society. Published by Elsevier Inc. All rights reserved.)
- Published
- 2012
- Full Text
- View/download PDF
37. How to infer gene networks from expression profiles, revisited.
- Author
-
Penfold CA and Wild DL
- Abstract
Inferring the topology of a gene-regulatory network (GRN) from genome-scale time-series measurements of transcriptional change has proved useful for disentangling complex biological processes. To address the challenges associated with this inference, a number of competing approaches have previously been used, including examples from information theory, Bayesian and dynamic Bayesian networks (DBNs), and ordinary differential equation (ODE) or stochastic differential equation. The performance of these competing approaches have previously been assessed using a variety of in silico and in vivo datasets. Here, we revisit this work by assessing the performance of more recent network inference algorithms, including a novel non-parametric learning approach based upon nonlinear dynamical systems. For larger GRNs, containing hundreds of genes, these non-parametric approaches more accurately infer network structures than do traditional approaches, but at significant computational cost. For smaller systems, DBNs are competitive with the non-parametric approaches with respect to computational time and accuracy, and both of these approaches appear to be more accurate than Granger causality-based methods and those using simple ODEs models.
- Published
- 2011
- Full Text
- View/download PDF
38. Bayesian hierarchical clustering for microarray time series data with replicates and outlier measurements.
- Author
-
Cooke EJ, Savage RS, Kirk PD, Darkins R, and Wild DL
- Subjects
- Algorithms, Cluster Analysis, Gene Expression Profiling, Humans, Models, Biological, Normal Distribution, Saccharomyces cerevisiae, Bayes Theorem, Oligonucleotide Array Sequence Analysis methods
- Abstract
Background: Post-genomic molecular biology has resulted in an explosion of data, providing measurements for large numbers of genes, proteins and metabolites. Time series experiments have become increasingly common, necessitating the development of novel analysis tools that capture the resulting data structure. Outlier measurements at one or more time points present a significant challenge, while potentially valuable replicate information is often ignored by existing techniques., Results: We present a generative model-based Bayesian hierarchical clustering algorithm for microarray time series that employs Gaussian process regression to capture the structure of the data. By using a mixture model likelihood, our method permits a small proportion of the data to be modelled as outlier measurements, and adopts an empirical Bayes approach which uses replicate observations to inform a prior distribution of the noise variance. The method automatically learns the optimum number of clusters and can incorporate non-uniformly sampled time points. Using a wide variety of experimental data sets, we show that our algorithm consistently yields higher quality and more biologically meaningful clusters than current state-of-the-art methodologies. We highlight the importance of modelling outlier values by demonstrating that noisy genes can be grouped with other genes of similar biological function. We demonstrate the importance of including replicate information, which we find enables the discrimination of additional distinct expression profiles., Conclusions: By incorporating outlier measurements and replicate values, this clustering algorithm for time series microarray data provides a step towards a better treatment of the noise inherent in measurements from high-throughput genomic technologies. Timeseries BHC is available as part of the R package 'BHC' (version 1.5), which is available for download from Bioconductor (version 2.9 and above) via http://www.bioconductor.org/packages/release/bioc/html/BHC.html?pagewanted=all.
- Published
- 2011
- Full Text
- View/download PDF
39. A robust Bayesian two-sample test for detecting intervals of differential gene expression in microarray time series.
- Author
-
Stegle O, Denby KJ, Cooke EJ, Wild DL, Ghahramani Z, and Borgwardt KM
- Subjects
- Arabidopsis microbiology, Area Under Curve, Bayes Theorem, Computational Biology, Genes, Plant genetics, Models, Genetic, Multigene Family genetics, Normal Distribution, Time Factors, Arabidopsis genetics, Gene Expression Profiling, Gene Expression Regulation, Plant, Oligonucleotide Array Sequence Analysis methods
- Abstract
Understanding the regulatory mechanisms that are responsible for an organism's response to environmental change is an important issue in molecular biology. A first and important step towards this goal is to detect genes whose expression levels are affected by altered external conditions. A range of methods to test for differential gene expression, both in static as well as in time-course experiments, have been proposed. While these tests answer the question whether a gene is differentially expressed, they do not explicitly address the question when a gene is differentially expressed, although this information may provide insights into the course and causal structure of regulatory programs. In this article, we propose a two-sample test for identifying intervals of differential gene expression in microarray time series. Our approach is based on Gaussian process regression, can deal with arbitrary numbers of replicates, and is robust with respect to outliers. We apply our algorithm to study the response of Arabidopsis thaliana genes to an infection by a fungal pathogen using a microarray time series dataset covering 30,336 gene probes at 24 observed time points. In classification experiments, our test compares favorably with existing methods and provides additional insights into time-dependent differential expression.
- Published
- 2010
- Full Text
- View/download PDF
40. The dynamic architecture of the metabolic switch in Streptomyces coelicolor.
- Author
-
Nieselt K, Battke F, Herbig A, Bruheim P, Wentzel A, Jakobsen ØM, Sletta H, Alam MT, Merlo ME, Moore J, Omara WA, Morrissey ER, Juarez-Hermosillo MA, Rodríguez-García A, Nentwich M, Thomas L, Iqbal M, Legaie R, Gaze WH, Challis GL, Jansen RC, Dijkhuizen L, Rand DA, Wild DL, Bonin M, Reuther J, Wohlleben W, Smith MC, Burroughs NJ, Martín JF, Hodgson DA, Takano E, Breitling R, Ellingsen TE, and Wellington EM
- Subjects
- Anti-Bacterial Agents biosynthesis, Cluster Analysis, Fermentation, Gene Expression Regulation, Bacterial, Genes, Bacterial, Multigene Family, RNA, Bacterial genetics, Streptomyces coelicolor growth & development, Gene Expression Profiling, Streptomyces coelicolor genetics, Streptomyces coelicolor metabolism
- Abstract
Background: During the lifetime of a fermenter culture, the soil bacterium S. coelicolor undergoes a major metabolic switch from exponential growth to antibiotic production. We have studied gene expression patterns during this switch, using a specifically designed Affymetrix genechip and a high-resolution time-series of fermenter-grown samples., Results: Surprisingly, we find that the metabolic switch actually consists of multiple finely orchestrated switching events. Strongly coherent clusters of genes show drastic changes in gene expression already many hours before the classically defined transition phase where the switch from primary to secondary metabolism was expected. The main switch in gene expression takes only 2 hours, and changes in antibiotic biosynthesis genes are delayed relative to the metabolic rearrangements. Furthermore, global variation in morphogenesis genes indicates an involvement of cell differentiation pathways in the decision phase leading up to the commitment to antibiotic biosynthesis., Conclusions: Our study provides the first detailed insights into the complex sequence of early regulatory events during and preceding the major metabolic switch in S. coelicolor, which will form the starting point for future attempts at engineering antibiotic production in a biotechnological setting.
- Published
- 2010
- Full Text
- View/download PDF
41. R/BHC: fast Bayesian hierarchical clustering for microarray data.
- Author
-
Savage RS, Heller K, Xu Y, Ghahramani Z, Truman WM, Grant M, Denby KJ, and Wild DL
- Subjects
- Algorithms, Arabidopsis genetics, Bayes Theorem, Cluster Analysis, Oligonucleotide Array Sequence Analysis, Time Factors, Gene Expression Profiling methods, Software Design
- Abstract
Background: Although the use of clustering methods has rapidly become one of the standard computational approaches in the literature of microarray gene expression data analysis, little attention has been paid to uncertainty in the results obtained., Results: We present an R/Bioconductor port of a fast novel algorithm for Bayesian agglomerative hierarchical clustering and demonstrate its use in clustering gene expression microarray data. The method performs bottom-up hierarchical clustering, using a Dirichlet Process (infinite mixture) to model uncertainty in the data and Bayesian model selection to decide at each step which clusters to merge., Conclusion: Biologically plausible results are presented from a well studied data set: expression profiles of A. thaliana subjected to a variety of biotic and abiotic stresses. Our method avoids several limitations of traditional methods, for example how many clusters there should be and how to choose a principled distance metric.
- Published
- 2009
- Full Text
- View/download PDF
42. Reconstruction and stability of secondary structure elements in the context of protein structure prediction.
- Author
-
Podtelezhnikov AA and Wild DL
- Subjects
- Algorithms, Artificial Intelligence, Bacterial Proteins chemistry, Computer Simulation, Escherichia coli, Hydrogen Bonding, Models, Molecular, Monte Carlo Method, Nerve Tissue Proteins chemistry, Peptides chemistry, Plant Proteins chemistry, Temperature, src Homology Domains, src-Family Kinases chemistry, Models, Chemical, Protein Stability, Protein Structure, Secondary
- Abstract
Efficient and accurate reconstruction of secondary structure elements in the context of protein structure prediction is the major focus of this work. We present a novel approach capable of reconstructing alpha-helices and beta-sheets in atomic detail. The method is based on Metropolis Monte Carlo simulations in a force field of empirical potentials that are designed to stabilize secondary structure elements in room-temperature simulations. Particular attention is paid to lateral side-chain interactions in beta-sheets and between the turns of alpha-helices, as well as backbone hydrogen bonding. The force constants are optimized using contrastive divergence, a novel machine learning technique, from a data set of known structures. Using this approach, we demonstrate the applicability of the framework to the problem of reconstructing the overall protein fold for a number of commonly studied small proteins, based on only predicted secondary structure and contact map. For protein G and chymotrypsin inhibitor 2, we are able to reconstruct the secondary structure elements in atomic detail and the overall protein folds with a root mean-square deviation of <10 A. For cold-shock protein and the SH3 domain, we accurately reproduce the secondary structure elements and the topology of the 5-stranded beta-sheets, but not the barrel structure. The importance of high-quality secondary structure and contact map prediction is discussed.
- Published
- 2009
- Full Text
- View/download PDF
43. Bayesian segmental models with multiple sequence alignment profiles for protein secondary structure and contact map prediction.
- Author
-
Chu W, Ghahramani Z, Podtelezhnikov A, and Wild DL
- Subjects
- Algorithms, Computational Biology methods, Databases, Protein, Elapid Venoms chemistry, Elapid Venoms genetics, Internet, Likelihood Functions, Markov Chains, Models, Molecular, Models, Statistical, Nerve Tissue Proteins chemistry, Nerve Tissue Proteins genetics, ROC Curve, Reproducibility of Results, Bayes Theorem, Protein Structure, Secondary, Sequence Alignment methods
- Abstract
In this paper, we develop a segmental semi-Markov model (SSMM) for protein secondary structure prediction which incorporates multiple sequence alignment profiles with the purpose of improving the predictive performance. The segmental model is a generalization of the hidden Markov model where a hidden state generates segments of various length and secondary structure type. A novel parameterized model is proposed for the likelihood function that explicitly represents multiple sequence alignment profiles to capture the segmental conformation. Numerical results on benchmark data sets show that incorporating the profiles results in substantial improvements and the generalization performance is promising. By incorporating the information from long range interactions in beta-sheets, this model is also capable of carrying out inference on contact maps. This is an important advantage of probabilistic generative models over the traditional discriminative approach to protein secondary structure prediction. The Web server of our algorithm and supplementary materials are available at http://public.kgi.edu/-wild/bsm.html.
- Published
- 2006
- Full Text
- View/download PDF
44. Identifying protein complexes in high-throughput protein interaction screens using an infinite latent feature model.
- Author
-
Chu W, Ghahramani Z, Krause R, and Wild DL
- Subjects
- Algorithms, Bayes Theorem, Computational Biology, Computer Simulation, DNA-Directed RNA Polymerases chemistry, DNA-Directed RNA Polymerases isolation & purification, Likelihood Functions, Mass Spectrometry, Models, Molecular, Saccharomyces cerevisiae Proteins chemistry, Saccharomyces cerevisiae Proteins isolation & purification, Multiprotein Complexes chemistry, Multiprotein Complexes isolation & purification
- Abstract
We propose a Bayesian approach to identify protein complexes and their constituents from high-throughput protein-protein interaction screens. An infinite latent feature model that allows for multi-complex membership by individual proteins is coupled with a graph diffusion kernel that evaluates the likelihood of two proteins belonging to the same complex. Gibbs sampling is then used to infer a catalog of protein complexes from the interaction screen data. An advantage of this model is that it places no prior constraints on the number of complexes and automatically infers the number of significant complexes from the data. Validation results using affinity purification/mass spectrometry experimental data from yeast RNA-processing complexes indicate that our method is capable of partitioning the data in a biologically meaningful way. A supplementary web site containing larger versions of the figures is available at http://public.kgi.edu/wild/PSBO6/index.html.
- Published
- 2006
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.