397 results
Search Results
2. Personalized glucose forecasting for type 2 diabetes using data assimilation.
- Author
-
Albers, David J., Levine, Matthew, Gluckman, Bruce, Ginsberg, Henry, Hripcsak, George, and Mamykina, Lena
- Subjects
BLOOD sugar monitoring ,TYPE 2 diabetes ,QUALITY of life ,GLYCEMIC control ,BAYESIAN analysis ,GAUSSIAN processes - Abstract
Type 2 diabetes leads to premature death and reduced quality of life for 8% of Americans. Nutrition management is critical to maintaining glycemic control, yet it is difficult to achieve due to the high individual differences in glycemic response to nutrition. Anticipating glycemic impact of different meals can be challenging not only for individuals with diabetes, but also for expert diabetes educators. Personalized computational models that can accurately forecast an impact of a given meal on an individual’s blood glucose levels can serve as the engine for a new generation of decision support tools for individuals with diabetes. However, to be useful in practice, these computational engines need to generate accurate forecasts based on limited datasets consistent with typical self-monitoring practices of individuals with type 2 diabetes. This paper uses three forecasting machines: (i) data assimilation, a technique borrowed from atmospheric physics and engineering that uses Bayesian modeling to infuse data with human knowledge represented in a mechanistic model, to generate real-time, personalized, adaptable glucose forecasts; (ii) model averaging of data assimilation output; and (iii) dynamical Gaussian process model regression. The proposed data assimilation machine, the primary focus of the paper, uses a modified dual unscented Kalman filter to estimate states and parameters, personalizing the mechanistic models. Model selection is used to make a personalized model selection for the individual and their measurement characteristics. The data assimilation forecasts are empirically evaluated against actual postprandial glucose measurements captured by individuals with type 2 diabetes, and against predictions generated by experienced diabetes educators after reviewing a set of historical nutritional records and glucose measurements for the same individual. The evaluation suggests that the data assimilation forecasts compare well with specific glucose measurements and match or exceed in accuracy expert forecasts. We conclude by examining ways to present predictions as forecast-derived range quantities and evaluate the comparative advantages of these ranges. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
3. Enzyme sequestration by the substrate: An analysis in the deterministic and stochastic domains.
- Author
-
Petrides, Andreas and Vinnicombe, Glenn
- Subjects
PHOSPHORYLATION ,PHOSPHATASES ,KINASES ,ENZYMES ,SEQUESTRATION (Chemistry) - Abstract
This paper is concerned with the potential multistability of protein concentrations in the cell. That is, situations where one, or a family of, proteins may sit at one of two or more different steady state concentrations in otherwise identical cells, and in spite of them being in the same environment. For models of multisite protein phosphorylation for example, in the presence of excess substrate, it has been shown that the achievable number of stable steady states can increase linearly with the number of phosphosites available. In this paper, we analyse the consequences of adding enzyme docking to these and similar models, with the resultant sequestration of phosphatase and kinase by the fully unphosphorylated and by the fully phosphorylated substrates respectively. In the large molecule numbers limit, where deterministic analysis is applicable, we prove that there are always values for these rates of sequestration which, when exceeded, limit the extent of multistability. For the models considered here, these numbers are much smaller than the affinity of the enzymes to the substrate when it is in a modifiable state. As substrate enzyme-sequestration is increased, we further prove that the number of steady states will inevitably be reduced to one. For smaller molecule numbers a stochastic analysis is more appropriate, where multistability in the large molecule numbers limit can manifest itself as multimodality of the probability distribution; the system spending periods of time in the vicinity of one mode before jumping to another. Here, we find that substrate enzyme sequestration can induce bimodality even in systems where only a single steady state can exist at large numbers. To facilitate this analysis, we develop a weakly chained diagonally dominant M-matrix formulation of the Chemical Master Equation, allowing greater insights in the way particular mechanisms, like enzyme sequestration, can shape probability distributions and therefore exhibit different behaviour across different regimes. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
4. Ten simple rules to create biological network figures for communication.
- Author
-
Marai, G. Elisabeta, Pinaud, Bruno, Bühler, Katja, Lex, Alexander, and Morris, John H.
- Subjects
TELECOMMUNICATION systems ,BIOLOGICAL networks ,MEDICAL literature ,PHYSICAL sciences ,REFERENCE sources ,BIOLOGY - Abstract
Biological network figures are ubiquitous in the biology and medical literature. On the one hand, a good network figure can quickly provide information about the nature and degree of interactions between items and enable inferences about the reason for those interactions. On the other hand, good network figures are difficult to create. In this paper, we outline 10 simple rules for creating biological network figures for communication, from choosing layouts, to applying color or other channels to show attributes, to the use of layering and separation. These rules are accompanied by illustrative examples. We also provide a concise set of references and additional resources for each rule. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
5. Per-sample immunoglobulin germline inference from B cell receptor deep sequencing data.
- Author
-
Ralph, Duncan K. and IVMatsen, Frederick A.
- Subjects
B cell receptors ,IMMUNOGLOBULIN genes ,B cells ,ALLELES - Abstract
The collection of immunoglobulin genes in an individual’s germline, which gives rise to B cell receptors via recombination, is known to vary significantly across individuals. In humans, for example, each individual has only a fraction of the several hundred known V alleles. Furthermore, the currently-accepted set of known V alleles is both incomplete (particularly for non-European samples), and contains a significant number of spurious alleles. The resulting uncertainty as to which immunoglobulin alleles are present in any given sample results in inaccurate B cell receptor sequence annotations, and in particular inaccurate inferred naive ancestors. In this paper we first show that the currently widespread practice of aligning each sequence to its closest match in the full set of IMGT alleles results in a very large number of spurious alleles that are not in the sample’s true set of germline V alleles. We then describe a new method for inferring each individual’s germline gene set from deep sequencing data, and show that it improves upon existing methods by making a detailed comparison on a variety of simulated and real data samples. This new method has been integrated into the partis annotation and clonal family inference package, available at , and is run by default without affecting overall run time. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
6. PrediTALE: A novel model learned from quantitative data allows for new perspectives on TALE targeting.
- Author
-
Erkes, Annett, Mücke, Stefanie, Reschke, Maik, Boch, Jens, and Grau, Jan
- Subjects
TANDEM repeats ,PLANT genes ,NUCLEOTIDE sequence ,COMPUTATIONAL biology ,GENE targeting ,FORKHEAD transcription factors - Abstract
Plant-pathogenic Xanthomonas bacteria secrete transcription activator-like effectors (TALEs) into host cells, where they act as transcriptional activators on plant target genes to support bacterial virulence. TALEs have a unique modular DNA-binding domain composed of tandem repeats. Two amino acids within each tandem repeat, termed repeat-variable diresidues, bind to contiguous nucleotides on the DNA sequence and determine target specificity. In this paper, we propose a novel approach for TALE target prediction to identify potential virulence targets. Our approach accounts for recent findings concerning TALE targeting, including frame-shift binding by repeats of aberrant lengths, and the flexible strand orientation of target boxes relative to the transcription start of the downstream target gene. The computational model can account for dependencies between adjacent RVD positions. Model parameters are learned from the wealth of quantitative data that have been generated over the last years. We benchmark the novel approach, termed PrediTALE, using RNA-seq data after Xanthomonas infection in rice, and find an overall improvement of prediction performance compared with previous approaches. Using PrediTALE, we are able to predict several novel putative virulence targets. However, we also observe that no target genes are predicted by any prediction tool for several TALEs, which we term orphan TALEs for this reason. We postulate that one explanation for orphan TALEs are incomplete gene annotations and, hence, propose to replace promoterome-wide by genome-wide scans for target boxes. We demonstrate that known targets from promoterome-wide scans may be recovered by genome-wide scans, whereas the latter, combined with RNA-seq data, are able to detect putative targets independent of existing gene annotations. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
7. Properties of cardiac conduction in a cell-based computational model.
- Author
-
Jæger, Karoline Horgmo, Edwards, Andrew G., McCulloch, Andrew, and Tveito, Aslak
- Subjects
CARDIAC arrest ,HEART cells ,HEART conduction system ,COMPUTATIONAL acoustics ,SODIUM channels - Abstract
The conduction of electrical signals through cardiac tissue is essential for maintaining the function of the heart, and conduction abnormalities are known to potentially lead to life-threatening arrhythmias. The properties of cardiac conduction have therefore been the topic of intense study for decades, but a number of questions related to the mechanisms of conduction still remain unresolved. In this paper, we demonstrate how the so-called EMI model may be used to study some of these open questions. In the EMI model, the extracellular space, the cell membrane, the intracellular space and the cell connections are all represented as separate parts of the computational domain, and the model therefore allows for study of local properties that are hard to represent in the classical homogenized bidomain or monodomain models commonly used to study cardiac conduction. We conclude that a non-uniform sodium channel distribution increases the conduction velocity and decreases the time delays over gap junctions of reduced coupling in the EMI model simulations. We also present a theoretical optimal cell length with respect to conduction velocity and consider the possibility of ephaptic coupling (i.e. cell-to-cell coupling through the extracellular potential) acting as an alternative or supporting mechanism to gap junction coupling. We conclude that for a non-uniform distribution of sodium channels and a sufficiently small intercellular distance, ephaptic coupling can influence the dynamics of the sodium channels and potentially provide cell-to-cell coupling when the gap junction connection is absent. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
8. Predicting the mechanism and rate of H-NS binding to AT-rich DNA.
- Author
-
Riccardi, Enrico, van Mastbergen, Eva C., Navarre, William Wiley, and Vreede, Jocelyne
- Subjects
BACTERIA ,ARGININE ,DNA ,BIOCHEMISTRY ,PROTEINS - Abstract
Bacteria contain several nucleoid-associated proteins that organize their genomic DNA into the nucleoid by bending, wrapping or bridging DNA. The Histone-like Nucleoid Structuring protein H-NS found in many Gram-negative bacteria is a DNA bridging protein and can structure DNA by binding to two separate DNA duplexes or to adjacent sites on the same duplex, depending on external conditions. Several nucleotide sequences have been identified to which H-NS binds with high affinity, indicating H-NS prefers AT-rich DNA. To date, highly detailed structural information of the H-NS DNA complex remains elusive. Molecular simulation can complement experiments by modelling structures and their time evolution in atomistic detail. In this paper we report an exploration of the different binding modes of H-NS to a high affinity nucleotide sequence and an estimate of the associated rate constant. By means of molecular dynamics simulations, we identified three types of binding for H-NS to AT-rich DNA. To further sample the transitions between these binding modes, we performed Replica Exchange Transition Interface Sampling, providing predictions of the mechanism and rate constant of H-NS binding to DNA. H-NS interacts with the DNA through a conserved QGR motif, aided by a conserved arginine at position 93. The QGR motif interacts first with phosphate groups, followed by the formation of hydrogen bonds between acceptors in the DNA minor groove and the sidechains of either Q112 or R114. After R114 inserts into the minor groove, the rest of the QGR motif follows. Full insertion of the QGR motif in the minor groove is stable over several tens of nanoseconds, and involves hydrogen bonds between the bases and both backbone and sidechains of the QGR motif. The rate constant for the process of H-NS binding to AT-rich DNA resulting in full insertion of the QGR motif is in the order of 10
6 M−1 s−1 , which is rate limiting compared to the non-specific association of H-NS to the DNA backbone at a rate of 108 M−1 s−1 . [ABSTRACT FROM AUTHOR]- Published
- 2019
- Full Text
- View/download PDF
9. Thermodynamic model of gene regulation for the Or59b olfactory receptor in Drosophila.
- Author
-
González, Alejandra, Jafari, Shadi, Zenere, Alberto, Alenius, Mattias, and Altafini, Claudio
- Subjects
OLFACTORY receptors ,GENETIC regulation ,DROSOPHILA ,EUKARYOTES ,TRANSCRIPTION factors ,THERMODYNAMICS - Abstract
Complex eukaryotic promoters normally contain multiple cis-regulatory sequences for different transcription factors (TFs). The binding patterns of the TFs to these sites, as well as the way the TFs interact with each other and with the RNA polymerase (RNAp), lead to combinatorial problems rarely understood in detail, especially under varying epigenetic conditions. The aim of this paper is to build a model describing how the main regulatory cluster of the olfactory receptor Or59b drives transcription of this gene in Drosophila. The cluster-driven expression of this gene is represented as the equilibrium probability of RNAp being bound to the promoter region, using a statistical thermodynamic approach. The RNAp equilibrium probability is computed in terms of the occupancy probabilities of the single TFs of the cluster to the corresponding binding sites, and of the interaction rules among TFs and RNAp, using experimental data of Or59b expression to tune the model parameters. The model reproduces correctly the changes in RNAp binding probability induced by various mutation of specific sites and epigenetic modifications. Some of its predictions have also been validated in novel experiments. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
10. SFPEL-LPI: Sequence-based feature projection ensemble learning for predicting LncRNA-protein interactions.
- Author
-
Zhang, Wen, Tang, Guifeng, Huang, Feng, Zhang, Xining, Yue, Xiang, and Wu, Wenjian
- Subjects
RNA-protein interactions ,GENETIC regulation ,RNA interference ,RNA splicing ,ADENYLATION (Biochemistry) - Abstract
LncRNA-protein interactions play important roles in post-transcriptional gene regulation, poly-adenylation, splicing and translation. Identification of lncRNA-protein interactions helps to understand lncRNA-related activities. Existing computational methods utilize multiple lncRNA features or multiple protein features to predict lncRNA-protein interactions, but features are not available for all lncRNAs or proteins; most of existing methods are not capable of predicting interacting proteins (or lncRNAs) for new lncRNAs (or proteins), which don’t have known interactions. In this paper, we propose the sequence-based feature projection ensemble learning method, “SFPEL-LPI”, to predict lncRNA-protein interactions. First, SFPEL-LPI extracts lncRNA sequence-based features and protein sequence-based features. Second, SFPEL-LPI calculates multiple lncRNA-lncRNA similarities and protein-protein similarities by using lncRNA sequences, protein sequences and known lncRNA-protein interactions. Then, SFPEL-LPI combines multiple similarities and multiple features with a feature projection ensemble learning frame. In computational experiments, SFPEL-LPI accurately predicts lncRNA-protein associations and outperforms other state-of-the-art methods. More importantly, SFPEL-LPI can be applied to new lncRNAs (or proteins). The case studies demonstrate that our method can find out novel lncRNA-protein interactions, which are confirmed by literature. Finally, we construct a user-friendly web server, available at . [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
11. Chemical, Target, and Bioactive Properties of Allosteric Modulation.
- Author
-
van Westen, Gerard J. P., Gaulton, Anna, and Overington, John P.
- Subjects
ALLOSTERIC regulation ,PROTEINS ,MOLECULAR weights ,CHEMICAL libraries ,LIGAND binding (Biochemistry) ,ION channels ,NUCLEAR receptors (Biochemistry) - Abstract
Allosteric modulators are ligands for proteins that exert their effects via a different binding site than the natural (orthosteric) ligand site and hence form a conceptually distinct class of ligands for a target of interest. Here, the physicochemical and structural features of a large set of allosteric and non-allosteric ligands from the ChEMBL database of bioactive molecules are analyzed. In general allosteric modulators are relatively smaller, more lipophilic and more rigid compounds, though large differences exist between different targets and target classes. Furthermore, there are differences in the distribution of targets that bind these allosteric modulators. Allosteric modulators are over-represented in membrane receptors, ligand-gated ion channels and nuclear receptor targets, but are underrepresented in enzymes (primarily proteases and kinases). Moreover, allosteric modulators tend to bind to their targets with a slightly lower potency (5.96 log units versus 6.66 log units, p<0.01). However, this lower absolute affinity is compensated by their lower molecular weight and more lipophilic nature, leading to similar binding efficiency and surface efficiency indices. Subsequently a series of classifier models are trained, initially target class independent models followed by finer-grained target (architecture/functional class) based models using the target hierarchy of the ChEMBL database. Applications of these insights include the selection of likely allosteric modulators from existing compound collections, the design of novel chemical libraries biased towards allosteric regulators and the selection of targets potentially likely to yield allosteric modulators on screening. All data sets used in the paper are available for download. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
12. Predicting B cell receptor substitution profiles using public repertoire data.
- Author
-
Dhar, Amrit, Davidsen, Kristian, IVMatsen, Frederick A., and Minin, Vladimir N.
- Subjects
B cell receptors ,AMINO acids ,GENETIC mutation ,CLONING ,GERMINAL centers ,IMMUNOTECHNOLOGY - Abstract
B cells develop high affinity receptors during the course of affinity maturation, a cyclic process of mutation and selection. At the end of affinity maturation, a number of cells sharing the same ancestor (i.e. in the same “clonal family”) are released from the germinal center; their amino acid frequency profile reflects the allowed and disallowed substitutions at each position. These clonal-family-specific frequency profiles, called “substitution profiles”, are useful for studying the course of affinity maturation as well as for antibody engineering purposes. However, most often only a single sequence is recovered from each clonal family in a sequencing experiment, making it impossible to construct a clonal-family-specific substitution profile. Given the public release of many high-quality large B cell receptor datasets, one may ask whether it is possible to use such data in a prediction model for clonal-family-specific substitution profiles. In this paper, we present the method “Substitution Profiles Using Related Families” (SPURF), a penalized tensor regression framework that integrates information from a rich assemblage of datasets to predict the clonal-family-specific substitution profile for any single input sequence. Using this framework, we show that substitution profiles from similar clonal families can be leveraged together with simulated substitution profiles and germline gene sequence information to improve prediction. We fit this model on a large public dataset and validate the robustness of our approach on two external datasets. Furthermore, we provide a command-line tool in an open-source software package () implementing these ideas and providing easy prediction using our pre-fit models. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
13. The effect of cell geometry on polarization in budding yeast.
- Author
-
Trogdon, Michael, Drawert, Brian, Gomez, Carlos, Banavar, Samhita P., Yi, Tau-Mu, Campàs, Otger, and Petzold, Linda R.
- Subjects
SACCHAROMYCES cerevisiae ,STEM cells ,BIOLOGICAL evolution ,GENETIC transcription ,SYNTHETIC biology - Abstract
The localization (or polarization) of proteins on the membrane during the mating of budding yeast (Saccharomyces cerevisiae) is an important model system for understanding simple pattern formation within cells. While there are many existing mathematical models of polarization, for both budding and mating, there are still many aspects of this process that are not well understood. In this paper we set out to elucidate the effect that the geometry of the cell can have on the dynamics of certain models of polarization. Specifically, we look at several spatial stochastic models of Cdc42 polarization that have been adapted from published models, on a variety of tip-shaped geometries, to replicate the shape change that occurs during the growth of the mating projection. We show here that there is a complex interplay between the dynamics of polarization and the shape of the cell. Our results show that while models of polarization can generate a stable polarization cap, its localization at the tip of mating projections is unstable, with the polarization cap drifting away from the tip of the projection in a geometry dependent manner. We also compare predictions from our computational results to experiments that observe cells with projections of varying lengths, and track the stability of the polarization cap. Lastly, we examine one model of actin polarization and show that it is unlikely, at least for the models studied here, that actin dynamics and vesicle traffic are able to overcome this effect of geometry. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
14. Structural organization and energy storage in crosslinked actin assemblies.
- Author
-
Ma, Rui and Berro, Julien
- Subjects
ACTIN ,CLATHRIN ,ENDOCYTOSIS ,FIBERS ,FIMBRIN ,POLYMERIZATION - Abstract
During clathrin-mediated endocytosis in yeast cells, short actin filaments (< 200nm) and crosslinking protein fimbrin assemble to drive the internalization of the plasma membrane. However, the organization of the actin meshwork during endocytosis remains largely unknown. In addition, only a small fraction of the force necessary to elongate and pinch off vesicles can be accounted for by actin polymerization alone. In this paper, we used mathematical modeling to study the self-organization of rigid actin filaments in the presence of elastic crosslinkers in conditions relevant to endocytosis. We found that actin filaments condense into either a disordered meshwork or an ordered bundle depending on filament length and the mechanical and kinetic properties of the crosslinkers. Our simulations also demonstrated that these nanometer-scale actin structures can store a large amount of elastic energy within the crosslinkers (up to 10k
B T per crosslinker). This conversion of binding energy into elastic energy is the consequence of geometric constraints created by the helical pitch of the actin filaments, which results in frustrated configurations of crosslinkers attached to filaments. We propose that this stored elastic energy can be used at a later time in the endocytic process. As a proof of principle, we presented a simple mechanism for sustained torque production by ordered detachment of crosslinkers from a pair of parallel filaments. [ABSTRACT FROM AUTHOR]- Published
- 2018
- Full Text
- View/download PDF
15. SARNAclust: Semi-automatic detection of RNA protein binding motifs from immunoprecipitation data.
- Author
-
Dotu, Ivan, Adamson, Scott I., Coleman, Benjamin, Fournier, Cyril, Ricart-Altimiras, Emma, Eyras, Eduardo, and Chuang, Jeffrey H.
- Subjects
IMMUNOPRECIPITATION ,RNA-binding proteins ,PROTEIN-protein interactions ,NUCLEOTIDE sequence ,RNA splicing - Abstract
RNA-protein binding is critical to gene regulation, controlling fundamental processes including splicing, translation, localization and stability, and aberrant RNA-protein interactions are known to play a role in a wide variety of diseases. However, molecular understanding of RNA-protein interactions remains limited; in particular, identification of RNA motifs that bind proteins has long been challenging, especially when such motifs depend on both sequence and structure. Moreover, although RNA binding proteins (RBPs) often contain more than one binding domain, algorithms capable of identifying more than one binding motif simultaneously have not been developed. In this paper we present a novel pipeline to determine binding peaks in crosslinking immunoprecipitation (CLIP) data, to discover multiple possible RNA sequence/structure motifs among them, and to experimentally validate such motifs. At the core is a new semi-automatic algorithm SARNAclust, the first unsupervised method to identify and deconvolve multiple sequence/structure motifs simultaneously. SARNAclust computes similarity between sequence/structure objects using a graph kernel, providing the ability to isolate the impact of specific features through the bulge graph formalism. Application of SARNAclust to synthetic data shows its capability of clustering 5 motifs at once with a V-measure value of over 0.95, while GraphClust achieves only a V-measure of 0.083 and RNAcontext cannot detect any of the motifs. When applied to existing eCLIP sets, SARNAclust finds known motifs for SLBP and HNRNPC and novel motifs for several other RBPs such as AGGF1, AKAP8L and ILF3. We demonstrate an experimental validation protocol, a targeted Bind-n-Seq-like high-throughput sequencing approach that relies on RNA inverse folding for oligo pool design, that can validate the components within the SLBP motif. Finally, we use this protocol to experimentally interrogate the SARNAclust motif predictions for protein ILF3. Our results support a newly identified partially double-stranded UUUUUGAGA motif similar to that known for the splicing factor HNRNPC. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
16. PCSF: An R-package for network-based interpretation of high-throughput data.
- Author
-
Akhmedov, Murodzhon, Kedaigle, Amanda, Chong, Renan Escalante, Montemanni, Roberto, Bertoni, Francesco, Fraenkel, Ernest, and Kwee, Ivo
- Subjects
BIOINFORMATICS software ,DATA analysis software ,MATHEMATICAL optimization ,COMPUTATIONAL biology ,PROTEIN-protein interactions - Abstract
With the recent technological developments a vast amount of high-throughput data has been profiled to understand the mechanism of complex diseases. The current bioinformatics challenge is to interpret the data and underlying biology, where efficient algorithms for analyzing heterogeneous high-throughput data using biological networks are becoming increasingly valuable. In this paper, we propose a software package based on the Prize-collecting Steiner Forest graph optimization approach. The PCSF package performs fast and user-friendly network analysis of high-throughput data by mapping the data onto a biological networks such as protein-protein interaction, gene-gene interaction or any other correlation or coexpression based networks. Using the interaction networks as a template, it determines high-confidence subnetworks relevant to the data, which potentially leads to predictions of functional units. It also interactively visualizes the resulting subnetwork with functional enrichment analysis. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
17. Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model.
- Author
-
Wang, Sheng, Sun, Siqi, Li, Zhen, Zhang, Renyu, and Xu, Jinbo
- Subjects
PROTEIN structure ,ARTIFICIAL neural networks ,PROTEIN folding ,PAIRED comparisons (Mathematics) ,AMINO acid sequence - Abstract
Motivation: Protein contacts contain key information for the understanding of protein structure and function and thus, contact prediction from sequence is an important problem. Recently exciting progress has been made on this problem, but the predicted contacts for proteins without many sequence homologs is still of low quality and not very useful for de novo structure prediction. Method: This paper presents a new deep learning method that predicts contacts by integrating both evolutionary coupling (EC) and sequence conservation information through an ultra-deep neural network formed by two deep residual neural networks. The first residual network conducts a series of 1-dimensional convolutional transformation of sequential features; the second residual network conducts a series of 2-dimensional convolutional transformation of pairwise information including output of the first residual network, EC information and pairwise potential. By using very deep residual networks, we can accurately model contact occurrence patterns and complex sequence-structure relationship and thus, obtain high-quality contact prediction regardless of how many sequence homologs are available for proteins in question. Results: Our method greatly outperforms existing methods and leads to much more accurate contact-assisted folding. Tested on 105 CASP11 targets, 76 past CAMEO hard targets, and 398 membrane proteins, the average top L long-range prediction accuracy obtained by our method, one representative EC method CCMpred and the CASP11 winner MetaPSICOV is 0.47, 0.21 and 0.30, respectively; the average top L/10 long-range accuracy of our method, CCMpred and MetaPSICOV is 0.77, 0.47 and 0.59, respectively. Ab initio folding using our predicted contacts as restraints but without any force fields can yield correct folds (i.e., TMscore>0.6) for 203 of the 579 test proteins, while that using MetaPSICOV- and CCMpred-predicted contacts can do so for only 79 and 62 of them, respectively. Our contact-assisted models also have much better quality than template-based models especially for membrane proteins. The 3D models built from our contact prediction have TMscore>0.5 for 208 of the 398 membrane proteins, while those from homology modeling have TMscore>0.5 for only 10 of them. Further, even if trained mostly by soluble proteins, our deep learning method works very well on membrane proteins. In the recent blind CAMEO benchmark, our fully-automated web server implementing this method successfully folded 6 targets with a new fold and only 0.3L-2.3L effective sequence homologs, including one β protein of 182 residues, one α+β protein of 125 residues, one α protein of 140 residues, one α protein of 217 residues, one α/β of 260 residues and one α protein of 462 residues. Our method also achieved the highest F1 score on free-modeling targets in the latest CASP (Critical Assessment of Structure Prediction), although it was not fully implemented back then. Availability: [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
18. A Stochastic Model of the Yeast Cell Cycle Reveals Roles for Feedback Regulation in Limiting Cellular Variability.
- Author
-
Barik, Debashis, Ball, David A., Peccoud, Jean, and Tyson, John J.
- Subjects
CELL cycle ,CELL division ,CYCLIN-dependent kinases ,PROTEIN kinases ,CYCLINS - Abstract
The cell division cycle of eukaryotes is governed by a complex network of cyclin-dependent protein kinases (CDKs) and auxiliary proteins that govern CDK activities. The control system must function reliably in the context of molecular noise that is inevitable in tiny yeast cells, because mistakes in sequencing cell cycle events are detrimental or fatal to the cell or its progeny. To assess the effects of noise on cell cycle progression requires not only extensive, quantitative, experimental measurements of cellular heterogeneity but also comprehensive, accurate, mathematical models of stochastic fluctuations in the CDK control system. In this paper we provide a stochastic model of the budding yeast cell cycle that accurately accounts for the variable phenotypes of wild-type cells and more than 20 mutant yeast strains simulated in different growth conditions. We specifically tested the role of feedback regulations mediated by G1- and SG2M-phase cyclins to minimize the noise in cell cycle progression. Details of the model are informed and tested by quantitative measurements (by fluorescence in situ hybridization) of the joint distributions of mRNA populations in yeast cells. We use the model to predict the phenotypes of ~30 mutant yeast strains that have not yet been characterized experimentally. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
19. In Silico Knockout Studies of Xenophagic Capturing of Salmonella.
- Author
-
Scheidel, Jennifer, Amstein, Leonie, Ackermann, Jörg, Dikic, Ivan, and Koch, Ina
- Subjects
SALMONELLA detection ,SALMONELLA typhimurium ,PATHOGENIC microorganisms ,EPITHELIAL cells ,PLANT vacuoles - Abstract
The degradation of cytosol-invading pathogens by autophagy, a process known as xenophagy, is an important mechanism of the innate immune system. Inside the host, Salmonella Typhimurium invades epithelial cells and resides within a specialized intracellular compartment, the Salmonella-containing vacuole. A fraction of these bacteria does not persist inside the vacuole and enters the host cytosol. Salmonella Typhimurium that invades the host cytosol becomes a target of the autophagy machinery for degradation. The xenophagy pathway has recently been discovered, and the exact molecular processes are not entirely characterized. Complete kinetic data for each molecular process is not available, so far. We developed a mathematical model of the xenophagy pathway to investigate this key defense mechanism. In this paper, we present a Petri net model of Salmonella xenophagy in epithelial cells. The model is based on functional information derived from literature data. It comprises the molecular mechanism of galectin-8-dependent and ubiquitin-dependent autophagy, including regulatory processes, like nutrient-dependent regulation of autophagy and TBK1-dependent activation of the autophagy receptor, OPTN. To model the activation of TBK1, we proposed a new mechanism of TBK1 activation, suggesting a spatial and temporal regulation of this process. Using standard Petri net analysis techniques, we found basic functional modules, which describe different pathways of the autophagic capture of Salmonella and reflect the basic dynamics of the system. To verify the model, we performed in silico knockout experiments. We introduced a new concept of knockout analysis to systematically compute and visualize the results, using an in silico knockout matrix. The results of the in silico knockout analyses were consistent with published experimental results and provide a basis for future investigations of the Salmonella xenophagy pathway. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
20. Computational Discovery of Putative Leads for Drug Repositioning through Drug-Target Interaction Prediction.
- Author
-
Coelho, Edgar D., Arrais, Joel P., and Oliveira, José Luís
- Subjects
BACTERIAL organelles ,ORGANISMS ,DRUG development ,MOLECULAR docking ,DRUGS - Abstract
De novo experimental drug discovery is an expensive and time-consuming task. It requires the identification of drug-target interactions (DTIs) towards targets of biological interest, either to inhibit or enhance a specific molecular function. Dedicated computational models for protein simulation and DTI prediction are crucial for speed and to reduce the costs associated with DTI identification. In this paper we present a computational pipeline that enables the discovery of putative leads for drug repositioning that can be applied to any microbial proteome, as long as the interactome of interest is at least partially known. Network metrics calculated for the interactome of the bacterial organism of interest were used to identify putative drug-targets. Then, a random forest classification model for DTI prediction was constructed using known DTI data from publicly available databases, resulting in an area under the ROC curve of 0.91 for classification of out-of-sampling data. A drug-target network was created by combining 3,081 unique ligands and the expected ten best drug targets. This network was used to predict new DTIs and to calculate the probability of the positive class, allowing the scoring of the predicted instances. Molecular docking experiments were performed on the best scoring DTI pairs and the results were compared with those of the same ligands with their original targets. The results obtained suggest that the proposed pipeline can be used in the identification of new leads for drug repositioning. The proposed classification model is available at . [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
21. Likelihood-Based Inference of B Cell Clonal Families.
- Author
-
Ralph, Duncan K. and IVMatsen, Frederick A.
- Subjects
B cells ,CLONE cells ,B cell receptors ,MARKOV processes - Abstract
The human immune system depends on a highly diverse collection of antibody-making B cells. B cell receptor sequence diversity is generated by a random recombination process called “rearrangement” forming progenitor B cells, then a Darwinian process of lineage diversification and selection called “affinity maturation.” The resulting receptors can be sequenced in high throughput for research and diagnostics. Such a collection of sequences contains a mixture of various lineages, each of which may be quite numerous, or may consist of only a single member. As a step to understanding the process and result of this diversification, one may wish to reconstruct lineage membership, i.e. to cluster sampled sequences according to which came from the same rearrangement events. We call this clustering problem “clonal family inference.” In this paper we describe and validate a likelihood-based framework for clonal family inference based on a multi-hidden Markov Model (multi-HMM) framework for B cell receptor sequences. We describe an agglomerative algorithm to find a maximum likelihood clustering, two approximate algorithms with various trade-offs of speed versus accuracy, and a third, fast algorithm for finding specific lineages. We show that under simulation these algorithms greatly improve upon existing clonal family inference methods, and that they also give significantly different clusters than previous methods when applied to two real data sets. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
22. Quorum-Sensing Synchronization of Synthetic Toggle Switches: A Design Based on Monotone Dynamical Systems Theory.
- Author
-
Nikolaev, Evgeni V. and Sontag, Eduardo D.
- Subjects
QUORUM sensing ,CELL populations ,SYNTHETIC biology ,GENE expression ,PHYSICAL sciences ,FLOW cytometry - Abstract
Synthetic constructs in biotechnology, biocomputing, and modern gene therapy interventions are often based on plasmids or transfected circuits which implement some form of “on-off” switch. For example, the expression of a protein used for therapeutic purposes might be triggered by the recognition of a specific combination of inducers (e.g., antigens), and memory of this event should be maintained across a cell population until a specific stimulus commands a coordinated shut-off. The robustness of such a design is hampered by molecular (“intrinsic”) or environmental (“extrinsic”) noise, which may lead to spontaneous changes of state in a subset of the population and is reflected in the bimodality of protein expression, as measured for example using flow cytometry. In this context, a “majority-vote” correction circuit, which brings deviant cells back into the required state, is highly desirable, and quorum-sensing has been suggested as a way for cells to broadcast their states to the population as a whole so as to facilitate consensus. In this paper, we propose what we believe is the first such a design that has mathematically guaranteed properties of stability and auto-correction under certain conditions. Our approach is guided by concepts and theory from the field of “monotone” dynamical systems developed by M. Hirsch, H. Smith, and others. We benchmark our design by comparing it to an existing design which has been the subject of experimental and theoretical studies, illustrating its superiority in stability and self-correction of synchronization errors. Our stability analysis, based on dynamical systems theory, guarantees global convergence to steady states, ruling out unpredictable (“chaotic”) behaviors and even sustained oscillations in the limit of convergence. These results are valid no matter what are the values of parameters, and are based only on the wiring diagram. The theory is complemented by extensive computational bifurcation analysis, performed for a biochemically-detailed and biologically-relevant model that we developed. Another novel feature of our approach is that our theorems on exponential stability of steady states for homogeneous or mixed populations are valid independently of the number N of cells in the population, which is usually very large (N ≫ 1) and unknown. We prove that the exponential stability depends on relative proportions of each type of state only. While monotone systems theory has been used previously for systems biology analysis, the current work illustrates its power for synthetic biology design, and thus has wider significance well beyond the application to the important problem of coordination of toggle switches. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
23. Simulation and Theory of Antibody Binding to Crowded Antigen-Covered Surfaces.
- Author
-
De Michele, Cristiano, De Los Rios, Paolo, Foffi, Giuseppe, and Piazza, Francesco
- Subjects
ANTIGENS ,IMMUNOGLOBULIN G ,SURFACE plasmon resonance ,HAPTENS ,IMMUNE system - Abstract
In this paper we introduce a fully flexible coarse-grained model of immunoglobulin G (IgG) antibodies parametrized directly on cryo-EM data and simulate the binding dynamics of many IgGs to antigens adsorbed on a surface at increasing densities. Moreover, we work out a theoretical model that allows to explain all the features observed in the simulations. Our combined computational and theoretical framework is in excellent agreement with surface-plasmon resonance data and allows us to establish a number of important results. (i) Internal flexibility is key to maximize bivalent binding, flexible IgGs being able to explore the surface with their second arm in search for an available hapten. This is made clear by the strongly reduced ability to bind with both arms displayed by artificial IgGs designed to rigidly keep a prescribed shape. (ii) The large size of IgGs is instrumental to keep neighboring molecules at a certain distance (surface repulsion), which essentially makes antigens within reach of the second Fab always unoccupied on average. (iii) One needs to account independently for the thermodynamic and geometric factors that regulate the binding equilibrium. The key geometrical parameters, besides excluded-volume repulsion, describe the screening of free haptens by neighboring bound antibodies. We prove that the thermodynamic parameters govern the low-antigen-concentration regime, while the surface screening and repulsion only affect the binding at high hapten densities. Importantly, we prove that screening effects are concealed in relative measures, such as the fraction of bivalently bound antibodies. Overall, our model provides a valuable, accurate theoretical paradigm beyond existing frameworks to interpret experimental profiles of antibodies binding to multi-valent surfaces of different sorts in many contexts. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
24. Neighborhood Regularized Logistic Matrix Factorization for Drug-Target Interaction Prediction.
- Author
-
Liu, Yong, Wu, Min, Miao, Chunyan, Zhao, Peilin, and Li, Xiao-Li
- Subjects
TARGETED drug delivery ,DRUG interactions ,FACTORIZATION ,DRUG design ,PREDICTION models - Abstract
In pharmaceutical sciences, a crucial step of the drug discovery process is the identification of drug-target interactions. However, only a small portion of the drug-target interactions have been experimentally validated, as the experimental validation is laborious and costly. To improve the drug discovery efficiency, there is a great need for the development of accurate computational approaches that can predict potential drug-target interactions to direct the experimental verification. In this paper, we propose a novel drug-target interaction prediction algorithm, namely neighborhood regularized logistic matrix factorization (NRLMF). Specifically, the proposed NRLMF method focuses on modeling the probability that a drug would interact with a target by logistic matrix factorization, where the properties of drugs and targets are represented by drug-specific and target-specific latent vectors, respectively. Moreover, NRLMF assigns higher importance levels to positive observations (i.e., the observed interacting drug-target pairs) than negative observations (i.e., the unknown pairs). Because the positive observations are already experimentally verified, they are usually more trustworthy. Furthermore, the local structure of the drug-target interaction data has also been exploited via neighborhood regularization to achieve better prediction accuracy. We conducted extensive experiments over four benchmark datasets, and NRLMF demonstrated its effectiveness compared with five state-of-the-art approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
25. Improving Contact Prediction along Three Dimensions.
- Author
-
Feinauer, Christoph, Skwark, Marcin J., Pagnani, Andrea, and Aurell, Erik
- Subjects
STATISTICAL correlation ,NUCLEOTIDE sequencing ,HOMOLOGOUS chromosomes ,PROTEINS ,SCIENTIFIC knowledge - Abstract
Correlation patterns in multiple sequence alignments of homologous proteins can be exploited to infer information on the three-dimensional structure of their members. The typical pipeline to address this task, which we in this paper refer to as the three dimensions of contact prediction, is to (i) filter and align the raw sequence data representing the evolutionarily related proteins; (ii) choose a predictive model to describe a sequence alignment; (iii) infer the model parameters and interpret them in terms of structural properties, such as an accurate contact map. We show here that all three dimensions are important for overall prediction success. In particular, we show that it is possible to improve significantly along the second dimension by going beyond the pair-wise Potts models from statistical physics, which have hitherto been the focus of the field. These (simple) extensions are motivated by multiple sequence alignments often containing long stretches of gaps which, as a data feature, would be rather untypical for independent samples drawn from a Potts model. Using a large test set of proteins we show that the combined improvements along the three dimensions are as large as any reported to date. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
26. VASP-E: Specificity Annotation with a Volumetric Analysis of Electrostatic Isopotentials.
- Author
-
Chen, Brian Y.
- Subjects
PROTEIN structure ,ELECTROSTATICS ,VOLUMETRIC analysis ,COMPUTER algorithms ,COMPARATIVE studies ,PROTEIN binding - Abstract
Algorithms for comparing protein structure are frequently used for function annotation. By searching for subtle similarities among very different proteins, these algorithms can identify remote homologs with similar biological functions. In contrast, few comparison algorithms focus on specificity annotation, where the identification of subtle differences among very similar proteins can assist in finding small structural variations that create differences in binding specificity. Few specificity annotation methods consider electrostatic fields, which play a critical role in molecular recognition. To fill this gap, this paper describes VASP-E (Volumetric Analysis of Surface Properties with Electrostatics), a novel volumetric comparison tool based on the electrostatic comparison of protein-ligand and protein-protein binding sites. VASP-E exploits the central observation that three dimensional solids can be used to fully represent and compare both electrostatic isopotentials and molecular surfaces. With this integrated representation, VASP-E is able to dissect the electrostatic environments of protein-ligand and protein-protein binding interfaces, identifying individual amino acids that have an electrostatic influence on binding specificity. VASP-E was used to examine a nonredundant subset of the serine and cysteine proteases as well as the barnase-barstar and Rap1a-raf complexes. Based on amino acids established by various experimental studies to have an electrostatic influence on binding specificity, VASP-E identified electrostatically influential amino acids with 100% precision and 83.3% recall. We also show that VASP-E can accurately classify closely related ligand binding cavities into groups with different binding preferences. These results suggest that VASP-E should prove a useful tool for the characterization of specific binding and the engineering of binding preferences in proteins. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
27. Analysis of the Protein Domain and Domain Architecture Content in Fungi and Its Application in the Search of New Antifungal Targets.
- Author
-
Barrera, Alejandro, Alastruey-Izquierdo, Ana, Martín, María J., Cuesta, Isabel, and Vizcaíno, Juan Antonio
- Subjects
PROTEIN structure ,ANTIFUNGAL agents ,MYCOSES ,MORTALITY ,DRUG development - Abstract
Over the past several years fungal infections have shown an increasing incidence in the susceptible population, and caused high mortality rates. In parallel, multi-resistant fungi are emerging in human infections. Therefore, the identification of new potential antifungal targets is a priority. The first task of this study was to analyse the protein domain and domain architecture content of the 137 fungal proteomes (corresponding to 111 species) available in UniProtKB (UniProt KnowledgeBase) by January 2013. The resulting list of core and exclusive domain and domain architectures is provided in this paper. It delineates the different levels of fungal taxonomic classification: phylum, subphylum, order, genus and species. The analysis highlighted Aspergillus as the most diverse genus in terms of exclusive domain content. In addition, we also investigated which domains could be considered promiscuous in the different organisms. As an application of this analysis, we explored three different ways to detect potential targets for antifungal drugs. First, we compared the domain and domain architecture content of the human and fungal proteomes, and identified those domains and domain architectures only present in fungi. Secondly, we looked for information regarding fungal pathways in public repositories, where proteins containing promiscuous domains could be involved. Three pathways were identified as a result: lovastatin biosynthesis, xylan degradation and biosynthesis of siroheme. Finally, we classified a subset of the studied fungi in five groups depending on their occurrence in clinical samples. We then looked for exclusive domains in the groups that were more relevant clinically and determined which of them had the potential to bind small molecules. Overall, this study provides a comprehensive analysis of the available fungal proteomes and shows three approaches that can be used as a first step in the detection of new antifungal targets. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
28. Stability Curve Prediction of Homologous Proteins Using Temperature-Dependent Statistical Potentials.
- Author
-
Pucci, Fabrizio and Rooman, Marianne
- Subjects
PROTEIN analysis ,PREDICTION models ,BIOPHYSICS ,THERMAL stability ,FREE energy (Thermodynamics) - Abstract
The unraveling and control of protein stability at different temperatures is a fundamental problem in biophysics that is substantially far from being quantitatively and accurately solved, as it requires a precise knowledge of the temperature dependence of amino acid interactions. In this paper we attempt to gain insight into the thermal stability of proteins by designing a tool to predict the full stability curve as a function of the temperature for a set of 45 proteins belonging to 11 homologous families, given their sequence and structure, as well as the melting temperature () and the change in heat capacity () of proteins belonging to the same family. Stability curves constitute a fundamental instrument to analyze in detail the thermal stability and its relation to the thermodynamic stability, and to estimate the enthalpic and entropic contributions to the folding free energy. In summary, our approach for predicting the protein stability curves relies on temperature-dependent statistical potentials derived from three datasets of protein structures with targeted thermal stability properties. Using these potentials, the folding free energies () at three different temperatures were computed for each protein. The Gibbs-Helmholtz equation was then used to predict the protein's stability curve as the curve that best fits these three points. The results are quite encouraging: the standard deviations between the experimental and predicted 's, 's and folding free energies at room temperature () are equal to 13 , 1.3 ) and 4.1 , respectively, in cross-validation. The main sources of error and some further improvements and perspectives are briefly discussed. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
29. Switchable slow cellular conductances determine robustness and tunability of network states.
- Author
-
Drion, Guillaume, Dethier, Julie, Franci, Alessio, and Sepulchre, Rodolphe
- Subjects
COMPUTATIONAL biology ,BRAIN ,NEURONS ,SPATIOTEMPORAL processes ,CELLS ,CALCIUM channels - Abstract
Neuronal information processing is regulated by fast and localized fluctuations of brain states. Brain states reliably switch between distinct spatiotemporal signatures at a network scale even though they are composed of heterogeneous and variable rhythms at a cellular scale. We investigated the mechanisms of this network control in a conductance-based population model that reliably switches between active and oscillatory mean-fields. Robust control of the mean-field properties relies critically on a switchable negative intrinsic conductance at the cellular level. This conductance endows circuits with a shared cellular positive feedback that can switch population rhythms on and off at a cellular resolution. The switch is largely independent from other intrinsic neuronal properties, network size and synaptic connectivity. It is therefore compatible with the temporal variability and spatial heterogeneity induced by slower regulatory functions such as neuromodulation, synaptic plasticity and homeostasis. Strikingly, the required cellular mechanism is available in all cell types that possess T-type calcium channels but unavailable in computational models that neglect the slow kinetics of their activation. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
30. Including Thermal Fluctuations in Actomyosin Stable States Increases the Predicted Force per Motor and Macroscopic Efficiency in Muscle Modelling.
- Author
-
Marcucci, Lorenzo, Washio, Takumi, and Yanagida, Toshio
- Subjects
ACTOMYOSIN ,MYOSIN ,MUSCLE contraction ,POTENTIAL energy ,MOLECULAR force constants - Abstract
Muscle contractions are generated by cyclical interactions of myosin heads with actin filaments to form the actomyosin complex. To simulate actomyosin complex stable states, mathematical models usually define an energy landscape with a corresponding number of wells. The jumps between these wells are defined through rate constants. Almost all previous models assign these wells an infinite sharpness by imposing a relatively simple expression for the detailed balance, i.e., the ratio of the rate constants depends exponentially on the sole myosin elastic energy. Physically, this assumption corresponds to neglecting thermal fluctuations in the actomyosin complex stable states. By comparing three mathematical models, we examine the extent to which this hypothesis affects muscle model predictions at the single cross-bridge, single fiber, and organ levels in a ceteris paribus analysis. We show that including fluctuations in stable states allows the lever arm of the myosin to easily and dynamically explore all possible minima in the energy landscape, generating several backward and forward jumps between states during the lifetime of the actomyosin complex, whereas the infinitely sharp minima case is characterized by fewer jumps between states. Moreover, the analysis predicts that thermal fluctuations enable a more efficient contraction mechanism, in which a higher force is sustained by fewer attached cross-bridges. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
31. Per-sample immunoglobulin germline inference from B cell receptor deep sequencing data
- Author
-
Duncan Ralph and Frederick A. Matsen
- Subjects
0301 basic medicine ,Physiology ,Inference ,Biochemistry ,Germline ,Database and Informatics Methods ,0302 clinical medicine ,Immune Physiology ,Databases, Genetic ,Medicine and Health Sciences ,Biology (General) ,Data Management ,Genetics ,0303 health sciences ,Immune System Proteins ,Ecology ,Genes, Immunoglobulin ,High-Throughput Nucleotide Sequencing ,Phylogenetic Analysis ,3. Good health ,Phylogenetics ,Computational Theory and Mathematics ,Modeling and Simulation ,Mutation (genetic algorithm) ,Sequence Analysis ,Research Article ,Computer and Information Sciences ,QH301-705.5 ,Bioinformatics ,B-cell receptor ,Immunology ,Sequence Databases ,Receptors, Antigen, B-Cell ,Sequence alignment ,Computational biology ,Biology ,Research and Analysis Methods ,Deep sequencing ,Antibodies ,Set (abstract data type) ,03 medical and health sciences ,Cellular and Molecular Neuroscience ,Sequence Motif Analysis ,Point Mutation ,Humans ,Evolutionary Systematics ,Computer Simulation ,Allele ,Quantitative Biology - Populations and Evolution ,Molecular Biology ,Gene ,Ecology, Evolution, Behavior and Systematics ,Alleles ,030304 developmental biology ,Sequence (medicine) ,Taxonomy ,Evolutionary Biology ,Models, Genetic ,Populations and Evolution (q-bio.PE) ,Models, Immunological ,Biology and Life Sciences ,Proteins ,Computational Biology ,030104 developmental biology ,Biological Databases ,Germ Cells ,Genetic Loci ,FOS: Biological sciences ,Mutation ,Sequence Alignment ,030217 neurology & neurosurgery ,Software ,030215 immunology - Abstract
The collection of immunoglobulin genes in an individual’s germline, which gives rise to B cell receptors via recombination, is known to vary significantly across individuals. In humans, for example, each individual has only a fraction of the several hundred known V alleles. Furthermore, the currently-accepted set of known V alleles is both incomplete (particularly for non-European samples), and contains a significant number of spurious alleles. The resulting uncertainty as to which immunoglobulin alleles are present in any given sample results in inaccurate B cell receptor sequence annotations, and in particular inaccurate inferred naive ancestors. In this paper we first show that the currently widespread practice of aligning each sequence to its closest match in the full set of IMGT alleles results in a very large number of spurious alleles that are not in the sample’s true set of germline V alleles. We then describe a new method for inferring each individual’s germline gene set from deep sequencing data, and show that it improves upon existing methods by making a detailed comparison on a variety of simulated and real data samples. This new method has been integrated into the partis annotation and clonal family inference package, available at https://github.com/psathyrella/partis, and is run by default without affecting overall run time., Author summary Antibodies are an important component of the adaptive immune system, which itself determines our response to both pathogens and vaccines. They are produced by B cells through somatic recombination of germline DNA, which results in a vast diversity of antigen binding affinities across the B cell repertoire. We typically learn about the development of this repertoire, and its history of interaction with antigens, by sequencing large numbers of the DNA sequences from which antibodies are derived. In order to understand such data, it is necessary to determine the combination of germline V, D, and J genes that was rearranged to form each such B cell receptor sequence. This is difficult, however, because the immunoglobulin locus exhibits an extraordinary level of diversity across individuals—encompassing both allelic variation and gene duplication, deletion, and conversion—and because the locus’s large size and repetitive structure make germline sequencing very difficult. In this paper we describe a new computational method that avoids this difficulty by inferring each individual’s set of immunoglobulin germline genes directly from expressed B cell receptor sequence data.
- Published
- 2019
32. Wisdom of crowds in computational biology.
- Author
-
Papin, Jason A. and Mac Gabhann, Feilim
- Subjects
MEDICAL publishing ,MACHINE learning ,ARTIFICIAL intelligence in medicine ,COMPUTATIONAL biology ,INDIVIDUALIZED medicine - Abstract
The authors comment on the breadth of research published in the journal at the intersection of machine learning and health and biology. Topics covered include the application of machine learning to health and biology, the hope that data-driven strategies will lead to a richer understanding of biological mechanisms, and cross-journal initiatives aimed at exploring how disciplines can be brought together to tackle problems in computational biology and precision medicine.
- Published
- 2019
- Full Text
- View/download PDF
33. A metabolic core model elucidates how enhanced utilization of glucose and glutamine, with enhanced glutamine-dependent lactate production, promotes cancer cell growth
- Author
-
Damiani, Chiara, Colombo, Riccardo, Gaglio, Daniela, Mastroianni, Fabrizia, Pescini, Dario, Westerhoff, Hans Victor, Mauri, Giancarlo, Vanoni, Marco, Alberghina, Lilia, Damiani, C, Colombo, R, Gaglio, D, Mastroianni, F, Pescini, D, Westerhoff, H, Mauri, G, Vanoni, M, Alberghina, L, Synthetic Systems Biology (SILS, FNWI), Molecular Cell Physiology, and AIMMS
- Subjects
Metabolic Processes ,0301 basic medicine ,Glucose uptake ,Glutamine ,Biochemistry ,7. Clean energy ,Glucose Metabolism ,Drug Metabolism ,Metabolic Flux Analysi ,Neoplasms ,Metabolic flux analysis ,Medicine and Health Sciences ,Amino Acids ,lcsh:QH301-705.5 ,Ecology ,Organic Compounds ,Acidic Amino Acids ,Monosaccharides ,Ketones ,Enzymes ,Flux balance analysis ,Chemistry ,Computational Theory and Mathematics ,Modeling and Simulation ,Physical Sciences ,Carbohydrate Metabolism ,Oxidoreductases ,Metabolic Networks and Pathways ,Research Article ,Chemical Elements ,Human ,Pyruvate ,Citric Acid Cycle ,Carbohydrates ,Biology ,Models, Biological ,03 medical and health sciences ,Cellular and Molecular Neuroscience ,Genetic ,Manchester Institute of Biotechnology ,Genetics ,Animals ,Humans ,Pharmacokinetics ,Computer Simulation ,Lactic Acid ,Molecular Biology ,Dehydrogenases ,Ecology, Evolution, Behavior and Systematics ,Cell Proliferation ,Pharmacology ,Organic Chemistry ,Chemical Compounds ,Biology and Life Sciences ,Proteins ,Metabolic Networks and Pathway ,Metabolism ,ResearchInstitutes_Networks_Beacons/manchester_institute_of_biotechnology ,Metabolic Flux Analysis ,Oxygen ,Citric acid cycle ,Metabolic pathway ,030104 developmental biology ,Glucose ,lcsh:Biology (General) ,Enzymology ,Acids ,Flux (metabolism) - Abstract
Cancer cells share several metabolic traits, including aerobic production of lactate from glucose (Warburg effect), extensive glutamine utilization and impaired mitochondrial electron flow. It is still unclear how these metabolic rearrangements, which may involve different molecular events in different cells, contribute to a selective advantage for cancer cell proliferation. To ascertain which metabolic pathways are used to convert glucose and glutamine to balanced energy and biomass production, we performed systematic constraint-based simulations of a model of human central metabolism. Sampling of the feasible flux space allowed us to obtain a large number of randomly mutated cells simulated at different glutamine and glucose uptake rates. We observed that, in the limited subset of proliferating cells, most displayed fermentation of glucose to lactate in the presence of oxygen. At high utilization rates of glutamine, oxidative utilization of glucose was decreased, while the production of lactate from glutamine was enhanced. This emergent phenotype was observed only when the available carbon exceeded the amount that could be fully oxidized by the available oxygen. Under the latter conditions, standard Flux Balance Analysis indicated that: this metabolic pattern is optimal to maximize biomass and ATP production; it requires the activity of a branched TCA cycle, in which glutamine-dependent reductive carboxylation cooperates to the production of lipids and proteins; it is sustained by a variety of redox-controlled metabolic reactions. In a K-ras transformed cell line we experimentally assessed glutamine-induced metabolic changes. We validated computational results through an extension of Flux Balance Analysis that allows prediction of metabolite variations. Taken together these findings offer new understanding of the logic of the metabolic reprogramming that underlies cancer cell growth., Author summary Hallmarks describing common key events in initiation, maintenance and progression of cancer have been identified. One hallmark deals with rewiring of metabolic reactions required to sustain enhanced cell proliferation. The availability of molecular, mechanistic models of cancer hallmarks will mightily improve optimized personal treatment and new drug discovery. Metabolism is the only hallmark for which it is currently possible to derive large scale mathematical models, which have predictive ability. In this paper, we exploit a constraint-based model of the core metabolism required for biomass conversion of the most relevant nutrients—glucose and glutamine—to clarify the logic of control of cancer metabolism. We newly report that, when available oxygen is not sufficient to fully oxidize available glucose and glutamine carbons–a situation compatible with that observed under normal oxygen conditions in human and in cancer cells growing in vitro—utilization of glutamine by reductive carboxylation and conversion of glucose and glutamine to lactate confer advantage for biomass production. Redox homeostasis can be maintained through the use of different alternative pathways. In conclusion, this paper offers a logic interpretation to the link between metabolic rewiring and enhanced proliferation, which may offer new approaches to targeted drug discovery and utilization.
- Published
- 2017
34. Personalized glucose forecasting for type 2 diabetes using data assimilation
- Author
-
David J. Albers, George Hripcsak, Matthew E. Levine, Lena Mamykina, Henry N. Ginsberg, and Bruce J. Gluckman
- Subjects
Blood Glucose ,Male ,Patient-Specific Modeling ,Computer science ,Physiology ,computer.software_genre ,Biochemistry ,0302 clinical medicine ,Data assimilation ,Endocrinology ,Mathematical and Statistical Techniques ,Medicine and Health Sciences ,Diabetes diagnosis and management ,Insulin ,030212 general & internal medicine ,lcsh:QH301-705.5 ,Computational model ,Ecology ,Organic Compounds ,Monosaccharides ,Non-insulin-dependent diabetes--Nutritional aspects ,Regression ,Blood Sugar ,3. Good health ,Type 2 Diabetes ,Body Fluids ,Chemistry ,Blood ,Computational Theory and Mathematics ,Modeling and Simulation ,Physical Sciences ,Female ,Anatomy ,Algorithms ,Statistics (Mathematics) ,Research Article ,Adult ,HbA1c ,Endocrine Disorders ,Carbohydrates ,030209 endocrinology & metabolism ,Bayesian inference ,Machine learning ,Research and Analysis Methods ,03 medical and health sciences ,Cellular and Molecular Neuroscience ,Genetics ,Diabetes Mellitus ,Humans ,Hemoglobin ,Statistical Methods ,Molecular Biology ,Ecology, Evolution, Behavior and Systematics ,Simulation ,Glycemic ,Nutrition ,Biology and life sciences ,business.industry ,Model selection ,Organic Chemistry ,Chemical Compounds ,Correction ,Computational Biology ,Proteins ,Kalman filter ,Diagnostic medicine ,Glucose ,Diabetes Mellitus, Type 2 ,lcsh:Biology (General) ,Metabolic Disorders ,Artificial intelligence ,business ,computer ,Mathematics ,Forecasting - Abstract
Type 2 diabetes leads to premature death and reduced quality of life for 8% of Americans. Nutrition management is critical to maintaining glycemic control, yet it is difficult to achieve due to the high individual differences in glycemic response to nutrition. Anticipating glycemic impact of different meals can be challenging not only for individuals with diabetes, but also for expert diabetes educators. Personalized computational models that can accurately forecast an impact of a given meal on an individual’s blood glucose levels can serve as the engine for a new generation of decision support tools for individuals with diabetes. However, to be useful in practice, these computational engines need to generate accurate forecasts based on limited datasets consistent with typical self-monitoring practices of individuals with type 2 diabetes. This paper uses three forecasting machines: (i) data assimilation, a technique borrowed from atmospheric physics and engineering that uses Bayesian modeling to infuse data with human knowledge represented in a mechanistic model, to generate real-time, personalized, adaptable glucose forecasts; (ii) model averaging of data assimilation output; and (iii) dynamical Gaussian process model regression. The proposed data assimilation machine, the primary focus of the paper, uses a modified dual unscented Kalman filter to estimate states and parameters, personalizing the mechanistic models. Model selection is used to make a personalized model selection for the individual and their measurement characteristics. The data assimilation forecasts are empirically evaluated against actual postprandial glucose measurements captured by individuals with type 2 diabetes, and against predictions generated by experienced diabetes educators after reviewing a set of historical nutritional records and glucose measurements for the same individual. The evaluation suggests that the data assimilation forecasts compare well with specific glucose measurements and match or exceed in accuracy expert forecasts. We conclude by examining ways to present predictions as forecast-derived range quantities and evaluate the comparative advantages of these ranges., Author summary Type 2 diabetes is a devastating disease that requires constant patient self-management of glucose, insulin, nutrition and exercise. Nevertheless, glucose and insulin dynamics are complicated, nonstationary, nonlinear, and individual-dependent, making self-management of diabetes a complex task. To help alleviate some of the difficulty for patients, we develop a method for personalized, real-time, glucose forecasting based on nutrition. Specifically, we create and evaluate the computational machinery based on both Gaussian process models and data assimilation that leverages the physiologic knowledge of two mechanistic models to produce a personalized, nutrition-based glucose forecast for individuals with type 2 diabetes in real time that is robust to sparse data and nonstationary patients. Our computational engine was conceived to be of potential use for diabetes self-management.
- Published
- 2017
35. A Graph-Centric Approach for Metagenome-Guided Peptide and Protein Identification in Metaproteomics
- Author
-
Haixu Tang, Sujun Li, and Yuzhen Ye
- Subjects
0301 basic medicine ,Proteomics ,Peptide ,Plant Science ,Biochemistry ,De Bruijn graph ,Database and Informatics Methods ,Tandem Mass Spectrometry ,Database Searching ,Photosynthesis ,lcsh:QH301-705.5 ,chemistry.chemical_classification ,Ecology ,Plant Biochemistry ,Microbiota ,Genomics ,6. Clean water ,Computational Theory and Mathematics ,Modeling and Simulation ,symbols ,Sequence Analysis ,Algorithms ,Research Article ,Gene prediction ,Sequence Databases ,Computational biology ,Biology ,Research and Analysis Methods ,03 medical and health sciences ,Cellular and Molecular Neuroscience ,symbols.namesake ,Genetics ,Ribulose-1,5-Bisphosphate Carboxylase Oxygenase ,Humans ,Molecular Biology Techniques ,Sequencing Techniques ,Sequence Similarity Searching ,Gene Prediction ,Gene ,Molecular Biology ,Ecology, Evolution, Behavior and Systematics ,Sequence Assembly Tools ,Biology and Life Sciences ,Computational Biology ,Proteins ,Genome Analysis ,030104 developmental biology ,Biological Databases ,lcsh:Biology (General) ,chemistry ,Metagenomics ,Metaproteomics ,Protein identification ,Peptides - Abstract
Metaproteomic studies adopt the common bottom-up proteomics approach to investigate the protein composition and the dynamics of protein expression in microbial communities. When matched metagenomic and/or metatranscriptomic data of the microbial communities are available, metaproteomic data analyses often employ a metagenome-guided approach, in which complete or fragmental protein-coding genes are first directly predicted from metagenomic (and/or metatranscriptomic) sequences or from their assemblies, and the resulting protein sequences are then used as the reference database for peptide/protein identification from MS/MS spectra. This approach is often limited because protein coding genes predicted from metagenomes are incomplete and fragmental. In this paper, we present a graph-centric approach to improving metagenome-guided peptide and protein identification in metaproteomics. Our method exploits the de Bruijn graph structure reported by metagenome assembly algorithms to generate a comprehensive database of protein sequences encoded in the community. We tested our method using several public metaproteomic datasets with matched metagenomic and metatranscriptomic sequencing data acquired from complex microbial communities in a biological wastewater treatment plant. The results showed that many more peptides and proteins can be identified when assembly graphs were utilized, improving the characterization of the proteins expressed in the microbial communities. The additional proteins we identified contribute to the characterization of important pathways such as those involved in degradation of chemical hazards. Our tools are released as open-source software on github at https://github.com/COL-IU/Graph2Pro., Author Summary In recent years, meta-omic (including metatranscriptomic and metaproteomic) techniques have been adopted as complementary approaches to metagenomic sequencing to study functional characteristics and dynamics of microbial communities, aiming at a holistic understanding of a community to respond to the changes in the environment. Currently, metaproteomic data are largely analyzed using the bioinformatics tools originally designed in bottom-up proteomics. In particular, recent metaproteomic studies employed a metagenome-guided approach, in which complete or fragmental protein-coding genes were first predicted from metagenomic sequences (i.e., contigs or scaffolds), acquired from the matched community samples, and predicted protein sequences were then used in peptide identification. A key challenge of this approach is that the protein coding genes predicted from assembled metagenomic contigs can be incomplete and fragmented due to the complexity of metagenomic samples and the short reads length in metagenomic sequencing. To address this issue, in this paper, we present a graph-centric approach that exploits the de bruijn graph structure reported by metagenome assembly algorithms to improve metagenome-guided peptide and protein identification in metaproteomics. We show that our method can identify much more peptides and proteins, improving the characterization of the proteins expressed in the microbial communities.
- Published
- 2016
36. Role of dynamic nuclear deformation on genomic architecture reorganization.
- Author
-
Seirin-Lee, Sungrim, Osakada, Fumitaka, Takeda, Junichi, Tashiro, Satoshi, Kobayashi, Ryo, Yamamoto, Takashi, and Ochiai, Hiroshi
- Subjects
NUCLEAR shapes ,CELL differentiation ,DEVELOPMENTAL biology ,CYTOLOGY ,MELANOPSIN - Abstract
Higher-order genomic architecture varies according to cell type and changes dramatically during differentiation. One of the remarkable examples of spatial genomic reorganization is the rod photoreceptor cell differentiation in nocturnal mammals. The inverted nuclear architecture found in adult mouse rod cells is formed through the reorganization of the conventional architecture during terminal differentiation. However, the mechanisms underlying these changes remain largely unknown. Here, we found that the dynamic deformation of nuclei via actomyosin-mediated contractility contributes to chromocenter clustering and promotes genomic architecture reorganization during differentiation by conducting an in cellulo experiment coupled with phase-field modeling. Similar patterns of dynamic deformation of the nucleus and a concomitant migration of the nuclear content were also observed in rod cells derived from the developing mouse retina. These results indicate that the common phenomenon of dynamic nuclear deformation, which accompanies dynamic cell behavior, can be a universal mechanism for spatiotemporal genomic reorganization. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
37. Mechanical properties of tubulin intra- and inter-dimer interfaces and their implications for microtubule dynamic instability.
- Author
-
Fedorov, Vladimir A., Orekhov, Philipp S., Kholina, Ekaterina G., Zhmurov, Artem A., Ataullakhanov, Fazoil I., Kovalenko, Ilya B., and Gudimchuk, Nikita B.
- Subjects
MICROTUBULES ,TUBULINS ,CELL morphology ,CYTOSKELETON ,GUANOSINE triphosphate ,COMPUTATIONAL biology ,PHYSICAL sciences - Abstract
Thirteen tubulin protofilaments, made of αβ-tubulin heterodimers, interact laterally to produce cytoskeletal microtubules. Microtubules exhibit the striking property of dynamic instability, manifested in their intermittent growth and shrinkage at both ends. This behavior is key to many cellular processes, such as cell division, migration, maintenance of cell shape, etc. Although assembly and disassembly of microtubules is known to be linked to hydrolysis of a guanosine triphosphate molecule in the pocket of β-tubulin, detailed mechanistic understanding of corresponding conformational changes is still lacking. Here we take advantage of the recent generation of in-microtubule structures of tubulin to examine the properties of protofilaments, which serve as important microtubule assembly and disassembly intermediates. We find that initially straight tubulin protofilaments, relax to similar non-radially curved and slightly twisted conformations. Our analysis further suggests that guanosine triphosphate hydrolysis primarily affects the flexibility and conformation of the inter-dimer interface, without a strong impact on the shape or flexibility of αβ-heterodimer. Inter-dimer interfaces are significantly more flexible compared to intra-dimer interfaces. We argue that such a difference in flexibility could be key for distinct stability of the plus and minus microtubule ends. The higher flexibility of the inter-dimer interface may have implications for development of pulling force by curving tubulin protofilaments during microtubule disassembly, a process of major importance for chromosome motions in mitosis. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
38. Optimizing antibody affinity and stability by the automated design of the variable light-heavy chain interfaces.
- Author
-
Warszawski, Shira, Katz, Aliza Borenstein, Lipsh, Rosalie, Khmelnitsky, Lev, Ben Nissan, Gili, Javitt, Gabriel, Dym, Orly, Unger, Tamar, Knop, Orli, Albeck, Shira, Diskin, Ron, Fass, Deborah, Sharon, Michal, and Fleishman, Sarel J.
- Subjects
IMMUNOGLOBULINS ,LYSOZYMES ,INTERNET servers ,IMMUNOTECHNOLOGY ,ANTIBODY formation ,ANTIGENS - Abstract
Antibodies developed for research and clinical applications may exhibit suboptimal stability, expressibility, or affinity. Existing optimization strategies focus on surface mutations, whereas natural affinity maturation also introduces mutations in the antibody core, simultaneously improving stability and affinity. To systematically map the mutational tolerance of an antibody variable fragment (Fv), we performed yeast display and applied deep mutational scanning to an anti-lysozyme antibody and found that many of the affinity-enhancing mutations clustered at the variable light-heavy chain interface, within the antibody core. Rosetta design combined enhancing mutations, yielding a variant with tenfold higher affinity and substantially improved stability. To make this approach broadly accessible, we developed AbLIFT, an automated web server that designs multipoint core mutations to improve contacts between specific Fv light and heavy chains (). We applied AbLIFT to two unrelated antibodies targeting the human antigens VEGF and QSOX1. Strikingly, the designs improved stability, affinity, and expression yields. The results provide proof-of-principle for bypassing laborious cycles of antibody engineering through automated computational affinity and stability design. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
39. ProtFus: A Comprehensive Method Characterizing Protein-Protein Interactions of Fusion Proteins.
- Author
-
Tagore, Somnath, Gorohovski, Alessandro, Jensen, Lars Juhl, and Frenkel-Morgenstern, Milana
- Subjects
PROTEIN-protein interactions ,SCIENTIFIC literature ,CHIMERIC proteins ,CANCER genes ,DRUG metabolism ,GENE fusion - Abstract
Tailored therapy aims to cure cancer patients effectively and safely, based on the complex interactions between patients' genomic features, disease pathology and drug metabolism. Thus, the continual increase in scientific literature drives the need for efficient methods of data mining to improve the extraction of useful information from texts based on patients' genomic features. An important application of text mining to tailored therapy in cancer encompasses the use of mutations and cancer fusion genes as moieties that change patients' cellular networks to develop cancer, and also affect drug metabolism. Fusion proteins, which are derived from the slippage of two parental genes, are produced in cancer by chromosomal aberrations and trans-splicing. Given that the two parental proteins for predicted fusion proteins are known, we used our previously developed method for identifying chimeric protein–protein interactions (ChiPPIs) associated with the fusion proteins. Here, we present a validation approach that receives fusion proteins of interest, predicts their cellular network alterations by ChiPPI and validates them by our new method, ProtFus, using an online literature search. This process resulted in a set of 358 fusion proteins and their corresponding protein interactions, as a training set for a Naïve Bayes classifier, to identify predicted fusion proteins that have reliable evidence in the literature and that were confirmed experimentally. Next, for a test group of 1817 fusion proteins, we were able to identify from the literature 2908 PPIs in total, across 18 cancer types. The described method, ProtFus, can be used for screening the literature to identify unique cases of fusion proteins and their PPIs, as means of studying alterations of protein networks in cancers. Availability: [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
40. PRAS: Predicting functional targets of RNA binding proteins based on CLIP-seq peaks.
- Author
-
Lin, Jianan, Zhang, Yuping, Frankel, Wayne N., and Ouyang, Zhengqing
- Subjects
RNA-binding proteins ,NON-coding RNA ,RNA-protein interactions ,PHYSICAL & theoretical chemistry - Abstract
RNA-protein interaction plays important roles in post-transcriptional regulation. Recent advancements in cross-linking and immunoprecipitation followed by sequencing (CLIP-seq) technologies make it possible to detect the binding peaks of a given RNA binding protein (RBP) at transcriptome scale. However, it is still challenging to predict the functional consequences of RBP binding peaks. In this study, we propose the Protein-RNA Association Strength (PRAS), which integrates the intensities and positions of the binding peaks of RBPs for functional mRNA targets prediction. We illustrate the superiority of PRAS over existing approaches on predicting the functional targets of two related but divergent CELF (CUGBP, ELAV-like factor) RBPs in mouse brain and muscle. We also demonstrate the potential of PRAS for wide adoption by applying it to the enhanced CLIP-seq (eCLIP) datasets of 37 RNA decay related RBPs in two human cell lines. PRAS can be utilized to investigate any RBPs with available CLIP-seq peaks. PRAS is freely available at . [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
41. On the optimal design of metabolic RNA labeling experiments.
- Author
-
Uvarovskii, Alexey, Naarmann-de Vries, Isabel S., and Dieterich, Christoph
- Subjects
DRUG labeling ,RNA ,NUCLEOTIDE sequence - Abstract
Massively parallel RNA sequencing (RNA-seq) in combination with metabolic labeling has become the de facto standard approach to study alterations in RNA transcription, processing or decay. Regardless of advances in the experimental protocols and techniques, every experimentalist needs to specify the key aspects of experimental design: For example, which protocol should be used (biochemical separation vs. nucleotide conversion) and what is the optimal labeling time? In this work, we provide approximate answers to these questions using the asymptotic theory of optimal design. Specifically, we investigate, how the variance of degradation rate estimates depends on the time and derive the optimal time for any given degradation rate. Subsequently, we show that an increase in sample numbers should be preferred over an increase in sequencing depth. Lastly, we provide some guidance on use cases when laborious biochemical separation outcompetes recent nucleotide conversion based methods (such as SLAMseq) and show, how inefficient conversion influences the precision of estimates. Code and documentation can be found at . [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
42. Predicting kinase inhibitors using bioactivity matrix derived informer sets.
- Author
-
Zhang, Huikun, Ericksen, Spencer S., Lee, Ching-pei, Ananiev, Gene E., Wlodarchak, Nathan, Yu, Peng, Mitchell, Julie C., Gitter, Anthony, Wright, Stephen J., Hoffmann, F. Michael, Wildman, Scott A., and Newton, Michael A.
- Subjects
CHEMICAL inhibitors ,KINASE inhibitors - Abstract
Prediction of compounds that are active against a desired biological target is a common step in drug discovery efforts. Virtual screening methods seek some active-enriched fraction of a library for experimental testing. Where data are too scarce to train supervised learning models for compound prioritization, initial screening must provide the necessary data. Commonly, such an initial library is selected on the basis of chemical diversity by some pseudo-random process (for example, the first few plates of a larger library) or by selecting an entire smaller library. These approaches may not produce a sufficient number or diversity of actives. An alternative approach is to select an informer set of screening compounds on the basis of chemogenomic information from previous testing of compounds against a large number of targets. We compare different ways of using chemogenomic data to choose a small informer set of compounds based on previously measured bioactivity data. We develop this Informer-Based-Ranking (IBR) approach using the Published Kinase Inhibitor Sets (PKIS) as the chemogenomic data to select the informer sets. We test the informer compounds on a target that is not part of the chemogenomic data, then predict the activity of the remaining compounds based on the experimental informer data and the chemogenomic data. Through new chemical screening experiments, we demonstrate the utility of IBR strategies in a prospective test on three kinase targets not included in the PKIS. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
43. Biophysics and population size constrains speciation in an evolutionary model of developmental system drift.
- Author
-
Khatri, Bhavin S. and Goldstein, Richard A.
- Subjects
BIOPHYSICS ,EVOLUTIONARY models ,GENETIC drift ,GLACIAL drift ,DEVELOPMENTAL biology ,GENETIC speciation - Abstract
Developmental system drift is a likely mechanism for the origin of hybrid incompatibilities between closely related species. We examine here the detailed mechanistic basis of hybrid incompatibilities between two allopatric lineages, for a genotype-phenotype map of developmental system drift under stabilising selection, where an organismal phenotype is conserved, but the underlying molecular phenotypes and genotype can drift. This leads to number of emergent phenomenon not obtainable by modelling genotype or phenotype alone. Our results show that: 1) speciation is more rapid at smaller population sizes with a characteristic, Orr-like, power law, but at large population sizes slow, characterised by a sub-diffusive growth law; 2) the molecular phenotypes under weakest selection contribute to the earliest incompatibilities; and 3) pair-wise incompatibilities dominate over higher order, contrary to previous predictions that the latter should dominate. The population size effect we find is consistent with previous results on allopatric divergence of transcription factor-DNA binding, where smaller populations have common ancestors with a larger drift load because genetic drift favours phenotypes which have a larger number of genotypes (higher sequence entropy) over more fit phenotypes which have far fewer genotypes; this means less substitutions are required in either lineage before incompatibilities arise. Overall, our results indicate that biophysics and population size provide a much stronger constraint to speciation than suggested by previous models, and point to a general mechanistic principle of how incompatibilities arise the under stabilising selection for an organismal phenotype. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
44. Assessing key decisions for transcriptomic data integration in biochemical networks.
- Author
-
Richelle, Anne, Joshi, Chintan, and Lewis, Nathan E.
- Subjects
GENE expression ,DATA integration ,GENE regulatory networks ,OVERLAY networks ,COMPUTATIONAL biology ,METABOLIC models - Abstract
To gain insights into complex biological processes, genome-scale data (e.g., RNA-Seq) are often overlaid on biochemical networks. However, many networks do not have a one-to-one relationship between genes and network edges, due to the existence of isozymes and protein complexes. Therefore, decisions must be made on how to overlay data onto networks. For example, for metabolic networks, these decisions include (1) how to integrate gene expression levels using gene-protein-reaction rules, (2) the approach used for selection of thresholds on expression data to consider the associated gene as “active”, and (3) the order in which these steps are imposed. However, the influence of these decisions has not been systematically tested. We compared 20 decision combinations using a transcriptomic dataset across 32 tissues and showed that definition of which reaction may be considered as active (i.e., reactions of the GEM with a non-zero expression level after overlaying the data) is mainly influenced by thresholding approach used. To determine the most appropriate decisions, we evaluated how these decisions impact the acquisition of tissue-specific active reaction lists that recapitulate organ-system tissue groups. These results will provide guidelines to improve data analyses with biochemical networks and facilitate the construction of context-specific metabolic models. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
45. Disease gene prediction for molecularly uncharacterized diseases.
- Author
-
Cáceres, Juan J. and Paccanaro, Alberto
- Subjects
DISEASES ,RESEARCH methodology ,COMPUTATIONAL biology ,MOLECULAR biology - Abstract
Network medicine approaches have been largely successful at increasing our knowledge of molecularly characterized diseases. Given a set of disease genes associated with a disease, neighbourhood-based methods and random walkers exploit the interactome allowing the prediction of further genes for that disease. In general, however, diseases with no known molecular basis constitute a challenge. Here we present a novel network approach to prioritize gene-disease associations that is able to also predict genes for diseases with no known molecular basis. Our method, which we have called Cardigan (ChARting DIsease Gene AssociatioNs), uses semi-supervised learning and exploits a measure of similarity between disease phenotypes. We evaluated its performance at predicting genes for both molecularly characterized and uncharacterized diseases in OMIM, using both weighted and binary interactomes, and compared it with state-of-the-art methods. Our tests, which use datasets collected at different points in time to replicate the dynamics of the disease gene discovery process, prove that Cardigan is able to accurately predict disease genes for molecularly uncharacterized diseases. Additionally, standard leave-one-out cross validation tests show how our approach outperforms state-of-the-art methods at predicting genes for molecularly characterized diseases by 14%-65%. Cardigan can also be used for disease module prediction, where it outperforms state-of-the-art methods by 87%-299%. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
46. DART-ID increases single-cell proteome coverage.
- Author
-
Chen, Albert Tian, Franks, Alexander, and Slavov, Nikolai
- Subjects
TANDEM mass spectrometry ,MONOCYTES ,RF values (Chromatography) ,LEUCOCYTES ,STATISTICAL power analysis ,LIQUID chromatography - Abstract
Analysis by liquid chromatography and tandem mass spectrometry (LC-MS/MS) can identify and quantify thousands of proteins in microgram-level samples, such as those comprised of thousands of cells. This process, however, remains challenging for smaller samples, such as the proteomes of single mammalian cells, because reduced protein levels reduce the number of confidently sequenced peptides. To alleviate this reduction, we developed Data-driven Alignment of Retention Times for IDentification (DART-ID). DART-ID implements principled Bayesian frameworks for global retention time (RT) alignment and for incorporating RT estimates towards improved confidence estimates of peptide-spectrum-matches. When applied to bulk or to single-cell samples, DART-ID increased the number of data points by 30–50% at 1% FDR, and thus decreased missing data. Benchmarks indicate excellent quantification of peptides upgraded by DART-ID and support their utility for quantitative analysis, such as identifying cell types and cell-type specific proteins. The additional datapoints provided by DART-ID boost the statistical power and double the number of proteins identified as differentially abundant in monocytes and T-cells. DART-ID can be applied to diverse experimental designs and is freely available at . [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
47. Energetic costs of cellular and therapeutic control of stochastic mitochondrial DNA populations.
- Author
-
Hoitzing, Hanne, Gammage, Payam A., Haute, Lindsey van, Minczuk, Michal, Johnston, Iain G., and Jones, Nick S.
- Subjects
MITOCHONDRIAL DNA ,BOTANY ,CYTOLOGY ,GENE therapy ,PHYSICAL sciences ,MOLECULAR biology - Abstract
The dynamics of the cellular proportion of mutant mtDNA molecules is crucial for mitochondrial diseases. Cellular populations of mitochondria are under homeostatic control, but the details of the control mechanisms involved remain elusive. Here, we use stochastic modelling to derive general results for the impact of cellular control on mtDNA populations, the cost to the cell of different mtDNA states, and the optimisation of therapeutic control of mtDNA populations. This formalism yields a wealth of biological results, including that an increasing mtDNA variance can increase the energetic cost of maintaining a tissue, that intermediate levels of heteroplasmy can be more detrimental than homoplasmy even for a dysfunctional mutant, that heteroplasmy distribution (not mean alone) is crucial for the success of gene therapies, and that long-term rather than short intense gene therapies are more likely to beneficially impact mtDNA populations. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
48. Pathogenicity and functional impact of non-frameshifting insertion/deletion variation in the human genome.
- Author
-
Pagel, Kymberleigh A., Antaki, Danny, Lian, AoJie, Mort, Matthew, Cooper, David N., Sebat, Jonathan, Iakoucheva, Lilia M., Mooney, Sean D., and Radivojac, Predrag
- Subjects
HUMAN genome ,AUTISM spectrum disorders ,MICROBIAL virulence ,RECURRENT neural networks ,POST-translational modification ,PHYSICAL sciences - Abstract
Differentiation between phenotypically neutral and disease-causing genetic variation remains an open and relevant problem. Among different types of variation, non-frameshifting insertions and deletions (indels) represent an understudied group with widespread phenotypic consequences. To address this challenge, we present a machine learning method, MutPred-Indel, that predicts pathogenicity and identifies types of functional residues impacted by non-frameshifting insertion/deletion variation. The model shows good predictive performance as well as the ability to identify impacted structural and functional residues including secondary structure, intrinsic disorder, metal and macromolecular binding, post-translational modifications, allosteric sites, and catalytic residues. We identify structural and functional mechanisms impacted preferentially by germline variation from the Human Gene Mutation Database, recurrent somatic variation from COSMIC in the context of different cancers, as well as de novo variants from families with autism spectrum disorder. Further, the distributions of pathogenicity prediction scores generated by MutPred-Indel are shown to differentiate highly recurrent from non-recurrent somatic variation. Collectively, we present a framework to facilitate the interrogation of both pathogenicity and the functional effects of non-frameshifting insertion/deletion variants. The MutPred-Indel webserver is available at . [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
49. DeepConv-DTI: Prediction of drug-target interactions via deep learning with convolution on protein sequences.
- Author
-
Lee, Ingoo, Keum, Jongsoo, and Nam, Hojung
- Subjects
AMINO acid sequence ,DEEP learning ,MATHEMATICAL convolutions ,CARRIER proteins ,BINDING sites ,PROTEIN models - Abstract
Identification of drug-target interactions (DTIs) plays a key role in drug discovery. The high cost and labor-intensive nature of in vitro and in vivo experiments have highlighted the importance of in silico-based DTI prediction approaches. In several computational models, conventional protein descriptors have been shown to not be sufficiently informative to predict accurate DTIs. Thus, in this study, we propose a deep learning based DTI prediction model capturing local residue patterns of proteins participating in DTIs. When we employ a convolutional neural network (CNN) on raw protein sequences, we perform convolution on various lengths of amino acids subsequences to capture local residue patterns of generalized protein classes. We train our model with large-scale DTI information and demonstrate the performance of the proposed model using an independent dataset that is not seen during the training phase. As a result, our model performs better than previous protein descriptor-based models. Also, our model performs better than the recently developed deep learning models for massive prediction of DTIs. By examining pooled convolution results, we confirmed that our model can detect binding sites of proteins for DTIs. In conclusion, our prediction model for detecting local residue patterns of target proteins successfully enriches the protein features of a raw protein sequence, yielding better prediction results than previous approaches. Our code is available at . [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
50. Metabolic reprogramming dynamics in tumor spheroids: Insights from a multicellular, multiscale model.
- Author
-
Roy, Mahua and Finley, Stacey D.
- Subjects
BIOLOGICAL mathematical modeling ,TUMOR growth ,CANCER cell growth ,CANCER cell proliferation ,CANCER prevention - Abstract
Mathematical modeling provides the predictive ability to understand the metabolic reprogramming and complex pathways that mediate cancer cells’ proliferation. We present a mathematical model using a multiscale, multicellular approach to simulate avascular tumor growth, applied to pancreatic cancer. The model spans three distinct spatial and temporal scales. At the extracellular level, reaction diffusion equations describe nutrient concentrations over a span of seconds. At the cellular level, a lattice-based energy driven stochastic approach describes cellular phenomena including adhesion, proliferation, viability and cell state transitions, occurring on the timescale of hours. At the sub-cellular level, we incorporate a detailed kinetic model of intracellular metabolite dynamics on the timescale of minutes, which enables the cells to uptake and excrete metabolites and use the metabolites to generate energy and building blocks for cell growth. This is a particularly novel aspect of the model. Certain defined criteria for the concentrations of intracellular metabolites lead to cancer cell growth, proliferation or death. Overall, we model the evolution of the tumor in both time and space. Starting with a cluster of tumor cells, the model produces an avascular tumor that quantitatively and qualitatively mimics experimental measurements of multicellular tumor spheroids. Through our model simulations, we can investigate the response of individual intracellular species under a metabolic perturbation and investigate how that response contributes to the response of the tumor as a whole. The predicted response of intracellular metabolites under various targeted strategies are difficult to resolve with experimental techniques. Thus, the model can give novel predictions as to the response of the tumor as a whole, identifies potential therapies to impede tumor growth, and predicts the effects of those therapeutic strategies. In particular, the model provides quantitative insight into the dynamic reprogramming of tumor cells at the intracellular level in response to specific metabolic perturbations. Overall, the model is a useful framework to study targeted metabolic strategies for inhibiting tumor growth. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.