901 results
Search Results
2. Rapid Prediction of Bacterial Heterotrophic Fluxomics Using Machine Learning and Constraint Programming.
- Author
-
Wu, Stephen Gang, Wang, Yuxuan, Jiang, Wu, Oyetunde, Tolutola, Yao, Ruilian, Zhang, Xuehong, Shimizu, Kazuyuki, Tang, Yinjie J., and Bao, Forrest Sheng
- Subjects
METABOLIC flux analysis ,SUPPORT vector machines ,CELL metabolism ,MACHINE learning ,STOICHIOMETRY - Abstract
13 C metabolic flux analysis (13 C-MFA) has been widely used to measure in vivo enzyme reaction rates (i.e., metabolic flux) in microorganisms. Mining the relationship between environmental and genetic factors and metabolic fluxes hidden in existing fluxomic data will lead to predictive models that can significantly accelerate flux quantification. In this paper, we present a web-based platform MFlux () that predicts the bacterial central metabolism via machine learning, leveraging data from approximately 10013 C-MFA papers on heterotrophic bacterial metabolisms. Three machine learning methods, namely Support Vector Machine (SVM), k-Nearest Neighbors (k-NN), and Decision Tree, were employed to study the sophisticated relationship between influential factors and metabolic fluxes. We performed a grid search of the best parameter set for each algorithm and verified their performance through 10-fold cross validations. SVM yields the highest accuracy among all three algorithms. Further, we employed quadratic programming to adjust flux profiles to satisfy stoichiometric constraints. Multiple case studies have shown that MFlux can reasonably predict fluxomes as a function of bacterial species, substrate types, growth rate, oxygen conditions, and cultivation methods. Due to the interest of studying model organism under particular carbon sources, bias of fluxome in the dataset may limit the applicability of machine learning models. This problem can be resolved after more papers on13 C-MFA are published for non-model species. [ABSTRACT FROM AUTHOR]- Published
- 2016
- Full Text
- View/download PDF
3. Personalized glucose forecasting for type 2 diabetes using data assimilation.
- Author
-
Albers, David J., Levine, Matthew, Gluckman, Bruce, Ginsberg, Henry, Hripcsak, George, and Mamykina, Lena
- Subjects
BLOOD sugar monitoring ,TYPE 2 diabetes ,QUALITY of life ,GLYCEMIC control ,BAYESIAN analysis ,GAUSSIAN processes - Abstract
Type 2 diabetes leads to premature death and reduced quality of life for 8% of Americans. Nutrition management is critical to maintaining glycemic control, yet it is difficult to achieve due to the high individual differences in glycemic response to nutrition. Anticipating glycemic impact of different meals can be challenging not only for individuals with diabetes, but also for expert diabetes educators. Personalized computational models that can accurately forecast an impact of a given meal on an individual’s blood glucose levels can serve as the engine for a new generation of decision support tools for individuals with diabetes. However, to be useful in practice, these computational engines need to generate accurate forecasts based on limited datasets consistent with typical self-monitoring practices of individuals with type 2 diabetes. This paper uses three forecasting machines: (i) data assimilation, a technique borrowed from atmospheric physics and engineering that uses Bayesian modeling to infuse data with human knowledge represented in a mechanistic model, to generate real-time, personalized, adaptable glucose forecasts; (ii) model averaging of data assimilation output; and (iii) dynamical Gaussian process model regression. The proposed data assimilation machine, the primary focus of the paper, uses a modified dual unscented Kalman filter to estimate states and parameters, personalizing the mechanistic models. Model selection is used to make a personalized model selection for the individual and their measurement characteristics. The data assimilation forecasts are empirically evaluated against actual postprandial glucose measurements captured by individuals with type 2 diabetes, and against predictions generated by experienced diabetes educators after reviewing a set of historical nutritional records and glucose measurements for the same individual. The evaluation suggests that the data assimilation forecasts compare well with specific glucose measurements and match or exceed in accuracy expert forecasts. We conclude by examining ways to present predictions as forecast-derived range quantities and evaluate the comparative advantages of these ranges. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
4. Enzyme sequestration by the substrate: An analysis in the deterministic and stochastic domains.
- Author
-
Petrides, Andreas and Vinnicombe, Glenn
- Subjects
PHOSPHORYLATION ,PHOSPHATASES ,KINASES ,ENZYMES ,SEQUESTRATION (Chemistry) - Abstract
This paper is concerned with the potential multistability of protein concentrations in the cell. That is, situations where one, or a family of, proteins may sit at one of two or more different steady state concentrations in otherwise identical cells, and in spite of them being in the same environment. For models of multisite protein phosphorylation for example, in the presence of excess substrate, it has been shown that the achievable number of stable steady states can increase linearly with the number of phosphosites available. In this paper, we analyse the consequences of adding enzyme docking to these and similar models, with the resultant sequestration of phosphatase and kinase by the fully unphosphorylated and by the fully phosphorylated substrates respectively. In the large molecule numbers limit, where deterministic analysis is applicable, we prove that there are always values for these rates of sequestration which, when exceeded, limit the extent of multistability. For the models considered here, these numbers are much smaller than the affinity of the enzymes to the substrate when it is in a modifiable state. As substrate enzyme-sequestration is increased, we further prove that the number of steady states will inevitably be reduced to one. For smaller molecule numbers a stochastic analysis is more appropriate, where multistability in the large molecule numbers limit can manifest itself as multimodality of the probability distribution; the system spending periods of time in the vicinity of one mode before jumping to another. Here, we find that substrate enzyme sequestration can induce bimodality even in systems where only a single steady state can exist at large numbers. To facilitate this analysis, we develop a weakly chained diagonally dominant M-matrix formulation of the Chemical Master Equation, allowing greater insights in the way particular mechanisms, like enzyme sequestration, can shape probability distributions and therefore exhibit different behaviour across different regimes. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
5. Ten simple rules to create biological network figures for communication.
- Author
-
Marai, G. Elisabeta, Pinaud, Bruno, Bühler, Katja, Lex, Alexander, and Morris, John H.
- Subjects
TELECOMMUNICATION systems ,BIOLOGICAL networks ,MEDICAL literature ,PHYSICAL sciences ,REFERENCE sources ,BIOLOGY - Abstract
Biological network figures are ubiquitous in the biology and medical literature. On the one hand, a good network figure can quickly provide information about the nature and degree of interactions between items and enable inferences about the reason for those interactions. On the other hand, good network figures are difficult to create. In this paper, we outline 10 simple rules for creating biological network figures for communication, from choosing layouts, to applying color or other channels to show attributes, to the use of layering and separation. These rules are accompanied by illustrative examples. We also provide a concise set of references and additional resources for each rule. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
6. Weak coupling between intracellular feedback loops explains dissociation of clock gene dynamics.
- Author
-
Schmal, Christoph, Ono, Daisuke, Myung, Jihwan, Pett, J. Patrick, Honma, Sato, Honma, Ken-Ichi, Herzel, Hanspeter, and Tokuda, Isao T.
- Subjects
MOLECULAR clock ,CIRCADIAN rhythms ,GENE expression ,PHYSICAL sciences ,CYTOLOGY - Abstract
Circadian rhythms are generated by interlocked transcriptional-translational negative feedback loops (TTFLs), the molecular process implemented within a cell. The contributions, weighting and balancing between the multiple feedback loops remain debated. Dissociated, free-running dynamics in the expression of distinct clock genes has been described in recent experimental studies that applied various perturbations such as slice preparations, light pulses, jet-lag, and culture medium exchange. In this paper, we provide evidence that this “presumably transient” dissociation of circadian gene expression oscillations may occur at the single-cell level. Conceptual and detailed mechanistic mathematical modeling suggests that such dissociation is due to a weak interaction between multiple feedback loops present within a single cell. The dissociable loops provide insights into underlying mechanisms and general design principles of the molecular circadian clock. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
7. Ensemble of decision tree reveals potential miRNA-disease associations.
- Author
-
Chen, Xing, Zhu, Chi-Chi, and Yin, Jun
- Subjects
DIMENSION reduction (Statistics) ,DECISION trees ,RENAL cancer ,THERAPEUTICS ,BREAST tumors ,MICRORNA - Abstract
In recent years, increasing associations between microRNAs (miRNAs) and human diseases have been identified. Based on accumulating biological data, many computational models for potential miRNA-disease associations inference have been developed, which saves time and expenditure on experimental studies, making great contributions to researching molecular mechanism of human diseases and developing new drugs for disease treatment. In this paper, we proposed a novel computational method named Ensemble of Decision Tree based MiRNA-Disease Association prediction (EDTMDA), which innovatively built a computational framework integrating ensemble learning and dimensionality reduction. For each miRNA-disease pair, the feature vector was extracted by calculating the statistical measures, graph theoretical measures, and matrix factorization results for the miRNA and disease, respectively. Then multiple base learnings were built to yield many decision trees (DTs) based on random selection of negative samples and miRNA/disease features. Particularly, Principal Components Analysis was applied to each base learning to reduce feature dimensionality and hence remove the noise or redundancy. Average strategy was adopted for these DTs to get final association scores between miRNAs and diseases. In model performance evaluation, EDTMDA showed AUC of 0.9309 in global leave-one-out cross validation (LOOCV) and AUC of 0.8524 in local LOOCV. Additionally, AUC of 0.9192+/-0.0009 in 5-fold cross validation proved the model’s reliability and stability. Furthermore, three types of case studies for four human diseases were implemented. As a result, 94% (Esophageal Neoplasms), 86% (Kidney Neoplasms), 96% (Breast Neoplasms) and 88% (Carcinoma Hepatocellular) of top 50 predicted miRNAs were confirmed by experimental evidences in literature. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
8. Per-sample immunoglobulin germline inference from B cell receptor deep sequencing data.
- Author
-
Ralph, Duncan K. and IVMatsen, Frederick A.
- Subjects
B cell receptors ,IMMUNOGLOBULIN genes ,B cells ,ALLELES - Abstract
The collection of immunoglobulin genes in an individual’s germline, which gives rise to B cell receptors via recombination, is known to vary significantly across individuals. In humans, for example, each individual has only a fraction of the several hundred known V alleles. Furthermore, the currently-accepted set of known V alleles is both incomplete (particularly for non-European samples), and contains a significant number of spurious alleles. The resulting uncertainty as to which immunoglobulin alleles are present in any given sample results in inaccurate B cell receptor sequence annotations, and in particular inaccurate inferred naive ancestors. In this paper we first show that the currently widespread practice of aligning each sequence to its closest match in the full set of IMGT alleles results in a very large number of spurious alleles that are not in the sample’s true set of germline V alleles. We then describe a new method for inferring each individual’s germline gene set from deep sequencing data, and show that it improves upon existing methods by making a detailed comparison on a variety of simulated and real data samples. This new method has been integrated into the partis annotation and clonal family inference package, available at , and is run by default without affecting overall run time. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
9. PrediTALE: A novel model learned from quantitative data allows for new perspectives on TALE targeting.
- Author
-
Erkes, Annett, Mücke, Stefanie, Reschke, Maik, Boch, Jens, and Grau, Jan
- Subjects
TANDEM repeats ,PLANT genes ,NUCLEOTIDE sequence ,COMPUTATIONAL biology ,GENE targeting ,FORKHEAD transcription factors - Abstract
Plant-pathogenic Xanthomonas bacteria secrete transcription activator-like effectors (TALEs) into host cells, where they act as transcriptional activators on plant target genes to support bacterial virulence. TALEs have a unique modular DNA-binding domain composed of tandem repeats. Two amino acids within each tandem repeat, termed repeat-variable diresidues, bind to contiguous nucleotides on the DNA sequence and determine target specificity. In this paper, we propose a novel approach for TALE target prediction to identify potential virulence targets. Our approach accounts for recent findings concerning TALE targeting, including frame-shift binding by repeats of aberrant lengths, and the flexible strand orientation of target boxes relative to the transcription start of the downstream target gene. The computational model can account for dependencies between adjacent RVD positions. Model parameters are learned from the wealth of quantitative data that have been generated over the last years. We benchmark the novel approach, termed PrediTALE, using RNA-seq data after Xanthomonas infection in rice, and find an overall improvement of prediction performance compared with previous approaches. Using PrediTALE, we are able to predict several novel putative virulence targets. However, we also observe that no target genes are predicted by any prediction tool for several TALEs, which we term orphan TALEs for this reason. We postulate that one explanation for orphan TALEs are incomplete gene annotations and, hence, propose to replace promoterome-wide by genome-wide scans for target boxes. We demonstrate that known targets from promoterome-wide scans may be recovered by genome-wide scans, whereas the latter, combined with RNA-seq data, are able to detect putative targets independent of existing gene annotations. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
10. Properties of cardiac conduction in a cell-based computational model.
- Author
-
Jæger, Karoline Horgmo, Edwards, Andrew G., McCulloch, Andrew, and Tveito, Aslak
- Subjects
CARDIAC arrest ,HEART cells ,HEART conduction system ,COMPUTATIONAL acoustics ,SODIUM channels - Abstract
The conduction of electrical signals through cardiac tissue is essential for maintaining the function of the heart, and conduction abnormalities are known to potentially lead to life-threatening arrhythmias. The properties of cardiac conduction have therefore been the topic of intense study for decades, but a number of questions related to the mechanisms of conduction still remain unresolved. In this paper, we demonstrate how the so-called EMI model may be used to study some of these open questions. In the EMI model, the extracellular space, the cell membrane, the intracellular space and the cell connections are all represented as separate parts of the computational domain, and the model therefore allows for study of local properties that are hard to represent in the classical homogenized bidomain or monodomain models commonly used to study cardiac conduction. We conclude that a non-uniform sodium channel distribution increases the conduction velocity and decreases the time delays over gap junctions of reduced coupling in the EMI model simulations. We also present a theoretical optimal cell length with respect to conduction velocity and consider the possibility of ephaptic coupling (i.e. cell-to-cell coupling through the extracellular potential) acting as an alternative or supporting mechanism to gap junction coupling. We conclude that for a non-uniform distribution of sodium channels and a sufficiently small intercellular distance, ephaptic coupling can influence the dynamics of the sodium channels and potentially provide cell-to-cell coupling when the gap junction connection is absent. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
11. A Bayesian framework for the analysis of systems biology models of the brain.
- Author
-
Russell-Buckland, Joshua, Barnes, Christopher P., and Tachtsidis, Ilias
- Subjects
BAYESIAN analysis ,BRAIN physiology ,SYSTEMS biology ,SENSITIVITY analysis ,MODELS & modelmaking - Abstract
Systems biology models are used to understand complex biological and physiological systems. Interpretation of these models is an important part of developing this understanding. These models are often fit to experimental data in order to understand how the system has produced various phenomena or behaviour that are seen in the data. In this paper, we have outlined a framework that can be used to perform Bayesian analysis of complex systems biology models. In particular, we have focussed on analysing a systems biology of the brain using both simulated and measured data. By using a combination of sensitivity analysis and approximate Bayesian computation, we have shown that it is possible to obtain distributions of parameters that can better guard against misinterpretation of results, as compared to a maximum likelihood estimate based approach. This is done through analysis of simulated and experimental data. NIRS measurements were simulated using the same simulated systemic input data for the model in a ‘healthy’ and ‘impaired’ state. By analysing both of these datasets, we show that different parameter spaces can be distinguished and compared between different physiological states or conditions. Finally, we analyse experimental data using the new Bayesian framework and the previous maximum likelihood estimate approach, showing that the Bayesian approach provides a more complete understanding of the parameter space. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
12. A kinetic model for Brain-Derived Neurotrophic Factor mediated spike timing-dependent LTP.
- Author
-
Solinas, Sergio M. G., Edelmann, Elke, Leßmann, Volkmar, and Migliore, Michele
- Subjects
NEUROTROPHINS ,MAMMALS ,NERVOUS system ,NEURONS ,NEUROLOGY - Abstract
Across the mammalian nervous system, neurotrophins control synaptic plasticity, neuromodulation, and neuronal growth. The neurotrophin Brain Derived Neurotrophic Factor (BDNF) is known to promote structural and functional synaptic plasticity in the hippocampus, the cerebral cortex, and many other brain areas. In recent years, a wealth of data has been accumulated revealing the paramount importance of BDNF for neuronal function. BDNF signaling gives rise to multiple complex signaling pathways that mediate neuronal survival and differentiation during development, and formation of new memories. These different roles of BDNF for neuronal function have essential consequences if BDNF signaling in the brain is reduced. Thus, BDNF knock-out mice or mice that are deficient in BDNF receptor signaling via TrkB and p75 receptors show deficits in neuronal development, synaptic plasticity, and memory formation. Accordingly, BDNF signaling dysfunctions are associated with many neurological and neurodegenerative conditions including Alzheimer's and Huntington's disease. However, despite the widespread implications of BDNF-dependent signaling in synaptic plasticity in healthy and pathological conditions, the interplay of the involved different biochemical pathways at the synaptic level remained mostly unknown. In this paper, we investigated the role of BDNF/TrkB signaling in spike-timing dependent plasticity (STDP) in rodent hippocampus CA1 pyramidal cells, by implementing the first subcellular model of BDNF regulated, spike timing-dependent long-term potentiation (t-LTP). The model is based on previously published experimental findings on STDP and accounts for the observed magnitude, time course, stimulation pattern and BDNF-dependence of t-LTP. It allows interpreting the main experimental findings concerning specific biomolecular processes, and it can be expanded to take into account more detailed biochemical reactions. The results point out a few predictions on how to enhance LTP induction in such a way to rescue or improve cognitive functions under pathological conditions. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
13. LMTRDA: Using logistic model tree to predict MiRNA-disease associations by fusing multi-source information of sequences and similarities.
- Author
-
Wang, Lei, You, Zhu-Hong, Chen, Xing, Li, Yang-Ming, Dong, Ya-Nan, Li, Li-Ping, and Zheng, Kai
- Subjects
LOGISTIC model (Demography) ,MICRORNA ,MEDICAL genetics ,RNA sequencing ,PREDICTION models ,BREAST tumors ,NATURAL language processing ,LYMPHOMA diagnosis - Abstract
Emerging evidence has shown microRNAs (miRNAs) play an important role in human disease research. Identifying potential association among them is significant for the development of pathology, diagnose and therapy. However, only a tiny portion of all miRNA-disease pairs in the current datasets are experimentally validated. This prompts the development of high-precision computational methods to predict real interaction pairs. In this paper, we propose a new model of Logistic Model Tree for predicting miRNA-Disease Association (LMTRDA) by fusing multi-source information including miRNA sequences, miRNA functional similarity, disease semantic similarity, and known miRNA-disease associations. In particular, we introduce miRNA sequence information and extract its features using natural language processing technique for the first time in the miRNA-disease prediction model. In the cross-validation experiment, LMTRDA obtained 90.51% prediction accuracy with 92.55% sensitivity at the AUC of 90.54% on the HMDD V3.0 dataset. To further evaluate the performance of LMTRDA, we compared it with different classifier and feature descriptor models. In addition, we also validate the predictive ability of LMTRDA in human diseases including Breast Neoplasms, Breast Neoplasms and Lymphoma. As a result, 28, 27 and 26 out of the top 30 miRNAs associated with these diseases were verified by experiments in different kinds of case studies. These experimental results demonstrate that LMTRDA is a reliable model for predicting the association among miRNAs and diseases. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
14. Predicting the mechanism and rate of H-NS binding to AT-rich DNA.
- Author
-
Riccardi, Enrico, van Mastbergen, Eva C., Navarre, William Wiley, and Vreede, Jocelyne
- Subjects
BACTERIA ,ARGININE ,DNA ,BIOCHEMISTRY ,PROTEINS - Abstract
Bacteria contain several nucleoid-associated proteins that organize their genomic DNA into the nucleoid by bending, wrapping or bridging DNA. The Histone-like Nucleoid Structuring protein H-NS found in many Gram-negative bacteria is a DNA bridging protein and can structure DNA by binding to two separate DNA duplexes or to adjacent sites on the same duplex, depending on external conditions. Several nucleotide sequences have been identified to which H-NS binds with high affinity, indicating H-NS prefers AT-rich DNA. To date, highly detailed structural information of the H-NS DNA complex remains elusive. Molecular simulation can complement experiments by modelling structures and their time evolution in atomistic detail. In this paper we report an exploration of the different binding modes of H-NS to a high affinity nucleotide sequence and an estimate of the associated rate constant. By means of molecular dynamics simulations, we identified three types of binding for H-NS to AT-rich DNA. To further sample the transitions between these binding modes, we performed Replica Exchange Transition Interface Sampling, providing predictions of the mechanism and rate constant of H-NS binding to DNA. H-NS interacts with the DNA through a conserved QGR motif, aided by a conserved arginine at position 93. The QGR motif interacts first with phosphate groups, followed by the formation of hydrogen bonds between acceptors in the DNA minor groove and the sidechains of either Q112 or R114. After R114 inserts into the minor groove, the rest of the QGR motif follows. Full insertion of the QGR motif in the minor groove is stable over several tens of nanoseconds, and involves hydrogen bonds between the bases and both backbone and sidechains of the QGR motif. The rate constant for the process of H-NS binding to AT-rich DNA resulting in full insertion of the QGR motif is in the order of 10
6 M−1 s−1 , which is rate limiting compared to the non-specific association of H-NS to the DNA backbone at a rate of 108 M−1 s−1 . [ABSTRACT FROM AUTHOR]- Published
- 2019
- Full Text
- View/download PDF
15. A data-driven interactome of synergistic genes improves network-based cancer outcome prediction.
- Author
-
Allahyar, Amin, Ubels, Joske, and de Ridder, Jeroen
- Subjects
CANCER patients ,GENE expression ,CANCER treatment ,HEALTH outcome assessment ,MOLECULAR genetics - Abstract
Robustly predicting outcome for cancer patients from gene expression is an important challenge on the road to better personalized treatment. Network-based outcome predictors (NOPs), which considers the cellular wiring diagram in the classification, hold much promise to improve performance, stability and interpretability of identified marker genes. Problematically, reports on the efficacy of NOPs are conflicting and for instance suggest that utilizing random networks performs on par to networks that describe biologically relevant interactions. In this paper we turn the prediction problem around: instead of using a given biological network in the NOP, we aim to identify the network of genes that truly improves outcome prediction. To this end, we propose SyNet, a gene network constructed ab initio from synergistic gene pairs derived from survival-labelled gene expression data. To obtain SyNet, we evaluate synergy for all 69 million pairwise combinations of genes resulting in a network that is specific to the dataset and phenotype under study and can be used to in a NOP model. We evaluated SyNet and 11 other networks on a compendium dataset of >4000 survival-labelled breast cancer samples. For this purpose, we used cross-study validation which more closely emulates real world application of these outcome predictors. We find that SyNet is the only network that truly improves performance, stability and interpretability in several existing NOPs. We show that SyNet overlaps significantly with existing gene networks, and can be confidently predicted (~85% AUC) from graph-topological descriptions of these networks, in particular the breast tissue-specific network. Due to its data-driven nature, SyNet is not biased to well-studied genes and thus facilitates post-hoc interpretation. We find that SyNet is highly enriched for known breast cancer genes and genes related to e.g. histological grade and tamoxifen resistance, suggestive of a role in determining breast cancer outcome. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
16. Thermodynamic model of gene regulation for the Or59b olfactory receptor in Drosophila.
- Author
-
González, Alejandra, Jafari, Shadi, Zenere, Alberto, Alenius, Mattias, and Altafini, Claudio
- Subjects
OLFACTORY receptors ,GENETIC regulation ,DROSOPHILA ,EUKARYOTES ,TRANSCRIPTION factors ,THERMODYNAMICS - Abstract
Complex eukaryotic promoters normally contain multiple cis-regulatory sequences for different transcription factors (TFs). The binding patterns of the TFs to these sites, as well as the way the TFs interact with each other and with the RNA polymerase (RNAp), lead to combinatorial problems rarely understood in detail, especially under varying epigenetic conditions. The aim of this paper is to build a model describing how the main regulatory cluster of the olfactory receptor Or59b drives transcription of this gene in Drosophila. The cluster-driven expression of this gene is represented as the equilibrium probability of RNAp being bound to the promoter region, using a statistical thermodynamic approach. The RNAp equilibrium probability is computed in terms of the occupancy probabilities of the single TFs of the cluster to the corresponding binding sites, and of the interaction rules among TFs and RNAp, using experimental data of Or59b expression to tune the model parameters. The model reproduces correctly the changes in RNAp binding probability induced by various mutation of specific sites and epigenetic modifications. Some of its predictions have also been validated in novel experiments. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
17. Global analysis of N6-methyladenosine functions and its disease association using deep learning and network-based methods.
- Author
-
Zhang, Song-yao, Zhang, Shao-wu, Fan, Xiao-nan, Meng, Jia, Chen, Yidong, Gao, Shou-Jiang, and Huang, Yufei
- Subjects
PHYSIOLOGICAL effects of adenosine ,DEEP learning ,MESSENGER RNA ,PROTEIN-protein interactions ,CELL proliferation - Abstract
N6-methyladenosine (m
6 A) is the most abundant methylation, existing in >25% of human mRNAs. Exciting recent discoveries indicate the close involvement of m6 A in regulating many different aspects of mRNA metabolism and diseases like cancer. However, our current knowledge about how m6 A levels are controlled and whether and how regulation of m6 A levels of a specific gene can play a role in cancer and other diseases is mostly elusive. We propose in this paper a computational scheme for predicting m6 A-regulated genes and m6 A-associated disease, which includes Deep-m6 A, the first model for detecting condition-specific m6 A sites from MeRIP-Seq data with a single base resolution using deep learning and Hot-m6 A, a new network-based pipeline that prioritizes functional significant m6 A genes and its associated diseases using the Protein-Protein Interaction (PPI) and gene-disease heterogeneous networks. We applied Deep-m6 A and this pipeline to 75 MeRIP-seq human samples, which produced a compact set of 709 functionally significant m6 A-regulated genes and nine functionally enriched subnetworks. The functional enrichment analysis of these genes and networks reveal that m6 A targets key genes of many critical biological processes including transcription, cell organization and transport, and cell proliferation and cancer-related pathways such as Wnt pathway. The m6 A-associated disease analysis prioritized five significantly associated diseases including leukemia and renal cell carcinoma. These results demonstrate the power of our proposed computational scheme and provide new leads for understanding m6 A regulatory functions and its roles in diseases. [ABSTRACT FROM AUTHOR]- Published
- 2019
- Full Text
- View/download PDF
18. SFPEL-LPI: Sequence-based feature projection ensemble learning for predicting LncRNA-protein interactions.
- Author
-
Zhang, Wen, Tang, Guifeng, Huang, Feng, Zhang, Xining, Yue, Xiang, and Wu, Wenjian
- Subjects
RNA-protein interactions ,GENETIC regulation ,RNA interference ,RNA splicing ,ADENYLATION (Biochemistry) - Abstract
LncRNA-protein interactions play important roles in post-transcriptional gene regulation, poly-adenylation, splicing and translation. Identification of lncRNA-protein interactions helps to understand lncRNA-related activities. Existing computational methods utilize multiple lncRNA features or multiple protein features to predict lncRNA-protein interactions, but features are not available for all lncRNAs or proteins; most of existing methods are not capable of predicting interacting proteins (or lncRNAs) for new lncRNAs (or proteins), which don’t have known interactions. In this paper, we propose the sequence-based feature projection ensemble learning method, “SFPEL-LPI”, to predict lncRNA-protein interactions. First, SFPEL-LPI extracts lncRNA sequence-based features and protein sequence-based features. Second, SFPEL-LPI calculates multiple lncRNA-lncRNA similarities and protein-protein similarities by using lncRNA sequences, protein sequences and known lncRNA-protein interactions. Then, SFPEL-LPI combines multiple similarities and multiple features with a feature projection ensemble learning frame. In computational experiments, SFPEL-LPI accurately predicts lncRNA-protein associations and outperforms other state-of-the-art methods. More importantly, SFPEL-LPI can be applied to new lncRNAs (or proteins). The case studies demonstrate that our method can find out novel lncRNA-protein interactions, which are confirmed by literature. Finally, we construct a user-friendly web server, available at . [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
19. Bayesian adaptive dual control of deep brain stimulation in a computational model of Parkinson’s disease.
- Author
-
Grado, Logan L., Johnson, Matthew D., and Netoff, Theoden I.
- Subjects
BAYESIAN analysis ,PROBABILITY theory ,BRAIN stimulation ,KINDLING (Neurology) ,TRANSCRANIAL magnetic stimulation - Abstract
In this paper, we present a novel Bayesian adaptive dual controller (ADC) for autonomously programming deep brain stimulation devices. We evaluated the Bayesian ADC’s performance in the context of reducing beta power in a computational model of Parkinson’s disease, in which it was tasked with finding the set of stimulation parameters which optimally reduced beta power as fast as possible. Here, the Bayesian ADC has dual goals: (a) to minimize beta power by exploiting the best parameters found so far, and (b) to explore the space to find better parameters, thus allowing for better control in the future. The Bayesian ADC is composed of two parts: an inner parameterized feedback stimulator and an outer parameter adjustment loop. The inner loop operates on a short time scale, delivering stimulus based upon the phase and power of the beta oscillation. The outer loop operates on a long time scale, observing the effects of the stimulation parameters and using Bayesian optimization to intelligently select new parameters to minimize the beta power. We show that the Bayesian ADC can efficiently optimize stimulation parameters, and is superior to other optimization algorithms. The Bayesian ADC provides a robust and general framework for tuning stimulation parameters, can be adapted to use any feedback signal, and is applicable across diseases and stimulator designs. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
20. A Scalable Computational Framework for Establishing Long-Term Behavior of Stochastic Reaction Networks.
- Author
-
Gupta, Ankit, Briat, Corentin, and Khammash, Mustafa
- Subjects
COMPUTATIONAL biology ,STOCHASTIC processes ,RANDOM variables ,MATHEMATICAL optimization ,INFORMATION networks - Abstract
Reaction networks are systems in which the populations of a finite number of species evolve through predefined interactions. Such networks are found as modeling tools in many biological disciplines such as biochemistry, ecology, epidemiology, immunology, systems biology and synthetic biology. It is now well-established that, for small population sizes, stochastic models for biochemical reaction networks are necessary to capture randomness in the interactions. The tools for analyzing such models, however, still lag far behind their deterministic counterparts. In this paper, we bridge this gap by developing a constructive framework for examining the long-term behavior and stability properties of the reaction dynamics in a stochastic setting. In particular, we address the problems of determining ergodicity of the reaction dynamics, which is analogous to having a globally attracting fixed point for deterministic dynamics. We also examine when the statistical moments of the underlying process remain bounded with time and when they converge to their steady state values. The framework we develop relies on a blend of ideas from probability theory, linear algebra and optimization theory. We demonstrate that the stability properties of a wide class of biological networks can be assessed from our sufficient theoretical conditions that can be recast as efficient and scalable linear programs, well-known for their tractability. It is notably shown that the computational complexity is often linear in the number of species. We illustrate the validity, the efficiency and the wide applicability of our results on several reaction networks arising in biochemistry, systems biology, epidemiology and ecology. The biological implications of the results as well as an example of a non-ergodic biological network are also discussed. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
21. Chemical, Target, and Bioactive Properties of Allosteric Modulation.
- Author
-
van Westen, Gerard J. P., Gaulton, Anna, and Overington, John P.
- Subjects
ALLOSTERIC regulation ,PROTEINS ,MOLECULAR weights ,CHEMICAL libraries ,LIGAND binding (Biochemistry) ,ION channels ,NUCLEAR receptors (Biochemistry) - Abstract
Allosteric modulators are ligands for proteins that exert their effects via a different binding site than the natural (orthosteric) ligand site and hence form a conceptually distinct class of ligands for a target of interest. Here, the physicochemical and structural features of a large set of allosteric and non-allosteric ligands from the ChEMBL database of bioactive molecules are analyzed. In general allosteric modulators are relatively smaller, more lipophilic and more rigid compounds, though large differences exist between different targets and target classes. Furthermore, there are differences in the distribution of targets that bind these allosteric modulators. Allosteric modulators are over-represented in membrane receptors, ligand-gated ion channels and nuclear receptor targets, but are underrepresented in enzymes (primarily proteases and kinases). Moreover, allosteric modulators tend to bind to their targets with a slightly lower potency (5.96 log units versus 6.66 log units, p<0.01). However, this lower absolute affinity is compensated by their lower molecular weight and more lipophilic nature, leading to similar binding efficiency and surface efficiency indices. Subsequently a series of classifier models are trained, initially target class independent models followed by finer-grained target (architecture/functional class) based models using the target hierarchy of the ChEMBL database. Applications of these insights include the selection of likely allosteric modulators from existing compound collections, the design of novel chemical libraries biased towards allosteric regulators and the selection of targets potentially likely to yield allosteric modulators on screening. All data sets used in the paper are available for download. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
22. Predicting B cell receptor substitution profiles using public repertoire data.
- Author
-
Dhar, Amrit, Davidsen, Kristian, IVMatsen, Frederick A., and Minin, Vladimir N.
- Subjects
B cell receptors ,AMINO acids ,GENETIC mutation ,CLONING ,GERMINAL centers ,IMMUNOTECHNOLOGY - Abstract
B cells develop high affinity receptors during the course of affinity maturation, a cyclic process of mutation and selection. At the end of affinity maturation, a number of cells sharing the same ancestor (i.e. in the same “clonal family”) are released from the germinal center; their amino acid frequency profile reflects the allowed and disallowed substitutions at each position. These clonal-family-specific frequency profiles, called “substitution profiles”, are useful for studying the course of affinity maturation as well as for antibody engineering purposes. However, most often only a single sequence is recovered from each clonal family in a sequencing experiment, making it impossible to construct a clonal-family-specific substitution profile. Given the public release of many high-quality large B cell receptor datasets, one may ask whether it is possible to use such data in a prediction model for clonal-family-specific substitution profiles. In this paper, we present the method “Substitution Profiles Using Related Families” (SPURF), a penalized tensor regression framework that integrates information from a rich assemblage of datasets to predict the clonal-family-specific substitution profile for any single input sequence. Using this framework, we show that substitution profiles from similar clonal families can be leveraged together with simulated substitution profiles and germline gene sequence information to improve prediction. We fit this model on a large public dataset and validate the robustness of our approach on two external datasets. Furthermore, we provide a command-line tool in an open-source software package () implementing these ideas and providing easy prediction using our pre-fit models. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
23. The effect of cell geometry on polarization in budding yeast.
- Author
-
Trogdon, Michael, Drawert, Brian, Gomez, Carlos, Banavar, Samhita P., Yi, Tau-Mu, Campàs, Otger, and Petzold, Linda R.
- Subjects
SACCHAROMYCES cerevisiae ,STEM cells ,BIOLOGICAL evolution ,GENETIC transcription ,SYNTHETIC biology - Abstract
The localization (or polarization) of proteins on the membrane during the mating of budding yeast (Saccharomyces cerevisiae) is an important model system for understanding simple pattern formation within cells. While there are many existing mathematical models of polarization, for both budding and mating, there are still many aspects of this process that are not well understood. In this paper we set out to elucidate the effect that the geometry of the cell can have on the dynamics of certain models of polarization. Specifically, we look at several spatial stochastic models of Cdc42 polarization that have been adapted from published models, on a variety of tip-shaped geometries, to replicate the shape change that occurs during the growth of the mating projection. We show here that there is a complex interplay between the dynamics of polarization and the shape of the cell. Our results show that while models of polarization can generate a stable polarization cap, its localization at the tip of mating projections is unstable, with the polarization cap drifting away from the tip of the projection in a geometry dependent manner. We also compare predictions from our computational results to experiments that observe cells with projections of varying lengths, and track the stability of the polarization cap. Lastly, we examine one model of actin polarization and show that it is unlikely, at least for the models studied here, that actin dynamics and vesicle traffic are able to overcome this effect of geometry. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
24. Structural organization and energy storage in crosslinked actin assemblies.
- Author
-
Ma, Rui and Berro, Julien
- Subjects
ACTIN ,CLATHRIN ,ENDOCYTOSIS ,FIBERS ,FIMBRIN ,POLYMERIZATION - Abstract
During clathrin-mediated endocytosis in yeast cells, short actin filaments (< 200nm) and crosslinking protein fimbrin assemble to drive the internalization of the plasma membrane. However, the organization of the actin meshwork during endocytosis remains largely unknown. In addition, only a small fraction of the force necessary to elongate and pinch off vesicles can be accounted for by actin polymerization alone. In this paper, we used mathematical modeling to study the self-organization of rigid actin filaments in the presence of elastic crosslinkers in conditions relevant to endocytosis. We found that actin filaments condense into either a disordered meshwork or an ordered bundle depending on filament length and the mechanical and kinetic properties of the crosslinkers. Our simulations also demonstrated that these nanometer-scale actin structures can store a large amount of elastic energy within the crosslinkers (up to 10k
B T per crosslinker). This conversion of binding energy into elastic energy is the consequence of geometric constraints created by the helical pitch of the actin filaments, which results in frustrated configurations of crosslinkers attached to filaments. We propose that this stored elastic energy can be used at a later time in the endocytic process. As a proof of principle, we presented a simple mechanism for sustained torque production by ordered detachment of crosslinkers from a pair of parallel filaments. [ABSTRACT FROM AUTHOR]- Published
- 2018
- Full Text
- View/download PDF
25. Potassium and sodium microdomains in thin astroglial processes: A computational model study.
- Author
-
Breslin, Kevin, Wade, John Joseph, Harkin, Jim, Flanagan, Bronac, McDaid, Liam, Wong-Lin, KongFatt, Van Zalinge, Harm, Hall, Steve, Walker, Matthew, and Verkhratsky, Alexei
- Subjects
BIOLOGICAL mathematical modeling ,HOMEOSTASIS ,EXTRACELLULAR space ,ASTROCYTES ,CENTRAL nervous system ,NEURAL transmission ,GABA - Abstract
A biophysical model that captures molecular homeostatic control of ions at the perisynaptic cradle (PsC) is of fundamental importance for understanding the interplay between astroglial and neuronal compartments. In this paper, we develop a multi-compartmental mathematical model which proposes a novel mechanism whereby the flow of cations in thin processes is restricted due to negatively charged membrane lipids which result in the formation of deep potential wells near the dipole heads. These wells restrict the flow of cations to “hopping” between adjacent wells as they transverse the process, and this surface retention of cations will be shown to give rise to the formation of potassium (K
+ ) and sodium (Na+ ) microdomains at the PsC. We further propose that a K+ microdomain formed at the PsC, provides the driving force for the return of K+ to the extracellular space for uptake by the neurone, thereby preventing K+ undershoot. A slow decay of Na+ was also observed in our simulation after a period of glutamate stimulation which is in strong agreement with experimental observations. The pathological implications of microdomain formation during neuronal excitation are also discussed. [ABSTRACT FROM AUTHOR]- Published
- 2018
- Full Text
- View/download PDF
26. SARNAclust: Semi-automatic detection of RNA protein binding motifs from immunoprecipitation data.
- Author
-
Dotu, Ivan, Adamson, Scott I., Coleman, Benjamin, Fournier, Cyril, Ricart-Altimiras, Emma, Eyras, Eduardo, and Chuang, Jeffrey H.
- Subjects
IMMUNOPRECIPITATION ,RNA-binding proteins ,PROTEIN-protein interactions ,NUCLEOTIDE sequence ,RNA splicing - Abstract
RNA-protein binding is critical to gene regulation, controlling fundamental processes including splicing, translation, localization and stability, and aberrant RNA-protein interactions are known to play a role in a wide variety of diseases. However, molecular understanding of RNA-protein interactions remains limited; in particular, identification of RNA motifs that bind proteins has long been challenging, especially when such motifs depend on both sequence and structure. Moreover, although RNA binding proteins (RBPs) often contain more than one binding domain, algorithms capable of identifying more than one binding motif simultaneously have not been developed. In this paper we present a novel pipeline to determine binding peaks in crosslinking immunoprecipitation (CLIP) data, to discover multiple possible RNA sequence/structure motifs among them, and to experimentally validate such motifs. At the core is a new semi-automatic algorithm SARNAclust, the first unsupervised method to identify and deconvolve multiple sequence/structure motifs simultaneously. SARNAclust computes similarity between sequence/structure objects using a graph kernel, providing the ability to isolate the impact of specific features through the bulge graph formalism. Application of SARNAclust to synthetic data shows its capability of clustering 5 motifs at once with a V-measure value of over 0.95, while GraphClust achieves only a V-measure of 0.083 and RNAcontext cannot detect any of the motifs. When applied to existing eCLIP sets, SARNAclust finds known motifs for SLBP and HNRNPC and novel motifs for several other RBPs such as AGGF1, AKAP8L and ILF3. We demonstrate an experimental validation protocol, a targeted Bind-n-Seq-like high-throughput sequencing approach that relies on RNA inverse folding for oligo pool design, that can validate the components within the SLBP motif. Finally, we use this protocol to experimentally interrogate the SARNAclust motif predictions for protein ILF3. Our results support a newly identified partially double-stranded UUUUUGAGA motif similar to that known for the splicing factor HNRNPC. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
27. In silico study of multicellular automaticity of heterogeneous cardiac cell monolayers: Effects of automaticity strength and structural linear anisotropy.
- Author
-
Duverger, James Elber, Jacquemet, Vincent, Vinet, Alain, and Comtois, Philippe
- Subjects
HEART cells ,AUTOMATICITY (Learning process) ,ANISOTROPY ,MYOCARDIUM ,PACEMAKER cells - Abstract
The biological pacemaker approach is an alternative to cardiac electronic pacemakers. Its main objective is to create pacemaking activity from added or modified distribution of spontaneous cells in the myocardium. This paper aims to assess how automaticity strength of pacemaker cells (i.e. their ability to maintain robust spontaneous activity with fast rate and to drive neighboring quiescent cells) and structural linear anisotropy, combined with density and spatial distribution of pacemaker cells, may affect the macroscopic behavior of the biological pacemaker. A stochastic algorithm was used to randomly distribute pacemaker cells, with various densities and spatial distributions, in a semi-continuous mathematical model. Simulations of the model showed that stronger automaticity allows onset of spontaneous activity for lower densities and more homogeneous spatial distributions, displayed more central foci, less variability in cycle lengths and synchronization of electrical activation for similar spatial patterns, but more variability in those same variables for dissimilar spatial patterns. Compared their isotropic counterparts, in silico anisotropic monolayers had less central foci and displayed more variability in cycle lengths and synchronization of electrical activation for both similar and dissimilar spatial patterns. The present study established a link between microscopic structure and macroscopic behavior of the biological pacemaker, and may provide crucial information for optimized biological pacemaker therapies. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
28. Modeling the interactions of sense and antisense Period transcripts in the mammalian circadian clock network.
- Author
-
Battogtokh, Dorjsuren, Kojima, Shihoko, and Tyson, John J.
- Subjects
CIRCADIAN rhythms ,ANTISENSE RNA ,MESSENGER RNA ,COMPUTATIONAL biology ,BIOINFORMATICS - Abstract
In recent years, it has become increasingly apparent that antisense transcription plays an important role in the regulation of gene expression. The circadian clock is no exception: an antisense transcript of the mammalian core-clock gene PERIOD2 (PER2), which we shall refer to as Per2AS RNA, oscillates with a circadian period and a nearly 12 h phase shift from the peak expression of Per2 mRNA. In this paper, we ask whether Per2AS plays a regulatory role in the mammalian circadian clock by studying in silico the potential effects of interactions between Per2 and Per2AS RNAs on circadian rhythms. Based on the antiphasic expression pattern, we consider two hypotheses about how Per2 and Per2AS mutually interfere with each other's expression. In our pre-transcriptional model, the transcription of Per2AS RNA from the non-coding strand represses the transcription of Per2 mRNA from the coding strand and vice versa. In our post-transcriptional model, Per2 and Per2AS transcripts form a double-stranded RNA duplex, which is rapidly degraded. To study these two possible mechanisms, we have added terms describing our alternative hypotheses to a published mathematical model of the molecular regulatory network of the mammalian circadian clock. Our pre-transcriptional model predicts that transcriptional interference between Per2 and Per2AS can generate alternative modes of circadian oscillations, which we characterize in terms of the amplitude and phase of oscillation of core clock genes. In our post-transcriptional model, Per2/Per2AS duplex formation dampens the circadian rhythm. In a model that combines pre- and post-transcriptional controls, the period, amplitude and phase of circadian proteins exhibit non-monotonic dependencies on the rate of expression of Per2AS. All three models provide potential explanations of the observed antiphasic, circadian oscillations of Per2 and Per2AS RNAs. They make discordant predictions that can be tested experimentally in order to distinguish among these alternative hypotheses. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
29. Cracking AlphaFold2: Leveraging the power of artificial intelligence in undergraduate biochemistry curriculums.
- Author
-
Boland, Devon J. and Ayres, Nicola M.
- Subjects
ARTIFICIAL intelligence ,AMINO acid sequence ,PROTEIN structure prediction ,BIOCHEMISTRY ,PROTEIN structure - Abstract
AlphaFold2 is an Artificial Intelligence-based program developed to predict the 3D structure of proteins given only their amino acid sequence at atomic resolution. Due to the accuracy and efficiency at which AlphaFold2 can generate 3D structure predictions and its widespread adoption into various aspects of biochemical research, the technique of protein structure prediction should be considered for incorporation into the undergraduate biochemistry curriculum. A module for introducing AlphaFold2 into a senior-level biochemistry laboratory classroom was developed. The module's focus was to have students predict the structures of proteins from the MPOX 22 global outbreak virus isolate genome, which had no structures elucidated at that time. The goal of this study was to both determine the impact the module had on students and to develop a framework for introducing AlphaFold2 into the undergraduate curriculum so that instructors for biochemistry courses, regardless of their background in bioinformatics, could adapt the module into their classrooms. Author summary: AlphaFold2 is software that combines sequence similarity and structure templating with the power of Artificial Intelligence (AI) to bridge the connection between the primary protein structure (amino acid sequence) and higher-level 3D structure (secondary, tertiary, and quaternary). AlphaFold2's impressive and easily accessible nature makes it a bioinformatics tool that has been seeing a wide range of applications in biochemical research. Given this large-scale application, we examined whether Alphafold2 could be integrated into an undergraduate curriculum. We developed a novel module for a senior-level undergraduate biochemistry laboratory class. Our goal was to lay a solid foundation for other undergraduate instructors to be able to adapt this module to fit their classroom needs. While we implemented and ran all predictions on an internal university computing cluster, we recommend ColabFold for those instructors who do not have access to large-scale computational clusters or whose internal clusters cannot scale to their classroom sizes. We have outlined 3 metrics to be quantitatively investigated in the module to give both instructors and students metrics to evaluate model confidence. We have also included a template worksheet, lecture slides, and example scripts to enable instructors to rapidly develop a similar module. We hope that more departments and programs will integrate AlphaFold2 into their undergraduate curriculums, giving students a highly in-demand skill to prepare them for their transition into a career or graduate school. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
30. Clusternomics: Integrative context-dependent clustering for heterogeneous datasets.
- Author
-
Gabasova, Evelina, Reid, John, and Wernisch, Lorenz
- Subjects
GENE expression ,DNA copy number variations ,MICRORNA ,DNA methylation ,PROTEOMICS - Abstract
Integrative clustering is used to identify groups of samples by jointly analysing multiple datasets describing the same set of biological samples, such as gene expression, copy number, methylation etc. Most existing algorithms for integrative clustering assume that there is a shared consistent set of clusters across all datasets, and most of the data samples follow this structure. However in practice, the structure across heterogeneous datasets can be more varied, with clusters being joined in some datasets and separated in others. In this paper, we present a probabilistic clustering method to identify groups across datasets that do not share the same cluster structure. The proposed algorithm, Clusternomics, identifies groups of samples that share their global behaviour across heterogeneous datasets. The algorithm models clusters on the level of individual datasets, while also extracting global structure that arises from the local cluster assignments. Clusters on both the local and the global level are modelled using a hierarchical Dirichlet mixture model to identify structure on both levels. We evaluated the model both on simulated and on real-world datasets. The simulated data exemplifies datasets with varying degrees of common structure. In such a setting Clusternomics outperforms existing algorithms for integrative and consensus clustering. In a real-world application, we used the algorithm for cancer subtyping, identifying subtypes of cancer from heterogeneous datasets. We applied the algorithm to TCGA breast cancer dataset, integrating gene expression, miRNA expression, DNA methylation and proteomics. The algorithm extracted clinically meaningful clusters with significantly different survival probabilities. We also evaluated the algorithm on lung and kidney cancer TCGA datasets with high dimensionality, again showing clinically significant results and scalability of the algorithm. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
31. The role of glutamate in neuronal ion homeostasis: A case study of spreading depolarization.
- Author
-
Hübel, Niklas, Hosseini-Zare, Mahshid S., Žiburkus, Jokūbas, and Ullah, Ghanim
- Subjects
GLUTAMIC acid ,HOMEOSTASIS ,NEUROCHEMISTRY ,NEUROTRANSMITTERS ,CYTOLOGY ,NEURONS ,THERAPEUTICS - Abstract
Simultaneous changes in ion concentrations, glutamate, and cell volume together with exchange of matter between cell network and vasculature are ubiquitous in numerous brain pathologies. A complete understanding of pathological conditions as well as normal brain function, therefore, hinges on elucidating the molecular and cellular pathways involved in these mostly interdependent variations. In this paper, we develop the first computational framework that combines the Hodgkin–Huxley type spiking dynamics, dynamic ion concentrations and glutamate homeostasis, neuronal and astroglial volume changes, and ion exchange with vasculature into a comprehensive model to elucidate the role of glutamate uptake in the dynamics of spreading depolarization (SD)—the electrophysiological event underlying numerous pathologies including migraine, ischemic stroke, aneurysmal subarachnoid hemorrhage, intracerebral hematoma, and trauma. We are particularly interested in investigating the role of glutamate in the duration and termination of SD caused by K
+ perfusion and oxygen-glucose deprivation. Our results demonstrate that glutamate signaling plays a key role in the dynamics of SD, and that impaired glutamate uptake leads to recovery failure of neurons from SD. We confirm predictions from our model experimentally by showing that inhibiting astrocytic glutamate uptake using TFB-TBOA nearly quadruples the duration of SD in layers 2-3 of visual cortical slices from juvenile rats. The model equations are either derived purely from first physical principles of electroneutrality, osmosis, and conservation of particles or a combination of these principles and known physiological facts. Accordingly, we claim that our approach can be used as a future guide to investigate the role of glutamate, ion concentrations, and dynamics cell volume in other brain pathologies and normal brain function. [ABSTRACT FROM AUTHOR]- Published
- 2017
- Full Text
- View/download PDF
32. PCSF: An R-package for network-based interpretation of high-throughput data.
- Author
-
Akhmedov, Murodzhon, Kedaigle, Amanda, Chong, Renan Escalante, Montemanni, Roberto, Bertoni, Francesco, Fraenkel, Ernest, and Kwee, Ivo
- Subjects
BIOINFORMATICS software ,DATA analysis software ,MATHEMATICAL optimization ,COMPUTATIONAL biology ,PROTEIN-protein interactions - Abstract
With the recent technological developments a vast amount of high-throughput data has been profiled to understand the mechanism of complex diseases. The current bioinformatics challenge is to interpret the data and underlying biology, where efficient algorithms for analyzing heterogeneous high-throughput data using biological networks are becoming increasingly valuable. In this paper, we propose a software package based on the Prize-collecting Steiner Forest graph optimization approach. The PCSF package performs fast and user-friendly network analysis of high-throughput data by mapping the data onto a biological networks such as protein-protein interaction, gene-gene interaction or any other correlation or coexpression based networks. Using the interaction networks as a template, it determines high-confidence subnetworks relevant to the data, which potentially leads to predictions of functional units. It also interactively visualizes the resulting subnetwork with functional enrichment analysis. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
33. ESPRIT-Forest: Parallel clustering of massive amplicon sequence data in subquadratic time.
- Author
-
Cai, Yunpeng, Zheng, Wei, Yao, Jin, Yang, Yujie, Mai, Volker, Mao, Qi, and Sun, Yijun
- Subjects
GENOMICS ,QUADRATIC programming ,HUMAN microbiota ,RIBOSOMAL RNA ,BIOACCUMULATION - Abstract
The rapid development of sequencing technology has led to an explosive accumulation of genomic sequence data. Clustering is often the first step to perform in sequence analy- sis, and hierarchical clustering is one of the most commonly used approaches for this purpose. However, it is currently computationally expensive to perform hierarchical clustering of extremely large sequence datasets due to its quadratic time and space complexities. In this paper we developed a new algorithm called ESPRIT-Forest for parallel hierarchical clustering of sequences. The algorithm achieves subquadratic time and space complexity and maintains a high clustering accuracy comparable to the standard method. The basic idea is to organize sequences into a pseudo-metric based partitioning tree for sub-linear time searching of nearest neighbors, and then use a new multiple-pair merging criterion to construct clusters in parallel using multiple threads. The new algorithm was tested on the human microbiome project (HMP) dataset, currently one of the largest published microbial 16S rRNA sequence dataset. Our experiment demonstrated that with the power of parallel computing it is now compu- tationally feasible to perform hierarchical clustering analysis of tens of millions of sequences. The software is available at . [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
34. Classification and adaptive behavior prediction of children with autism spectrum disorder based upon multivariate data analysis of markers of oxidative stress and DNA methylation.
- Author
-
Howsmon, Daniel P., Kruger, Uwe, Melnyk, Stepan, James, S. Jill, and Hahn, Juergen
- Subjects
AUTISTIC children ,ADAPTABILITY (Personality) ,MULTIVARIATE analysis ,OXIDATIVE stress ,DNA methylation - Abstract
The number of diagnosed cases of Autism Spectrum Disorders (ASD) has increased dramatically over the last four decades; however, there is still considerable debate regarding the underlying pathophysiology of ASD. This lack of biological knowledge restricts diagnoses to be made based on behavioral observations and psychometric tools. However, physiological measurements should support these behavioral diagnoses in the future in order to enable earlier and more accurate diagnoses. Stepping towards this goal of incorporating biochemical data into ASD diagnosis, this paper analyzes measurements of metabolite concentrations of the folate-dependent one-carbon metabolism and transulfuration pathways taken from blood samples of 83 participants with ASD and 76 age-matched neurotypical peers. Fisher Discriminant Analysis enables multivariate classification of the participants as on the spectrum or neurotypical which results in 96.1% of all neurotypical participants being correctly identified as such while still correctly identifying 97.6% of the ASD cohort. Furthermore, kernel partial least squares is used to predict adaptive behavior, as measured by the Vineland Adaptive Behavior Composite score, where measurement of five metabolites of the pathways was sufficient to predict the Vineland score with an R
2 of 0.45 after cross-validation. This level of accuracy for classification as well as severity prediction far exceeds any other approach in this field and is a strong indicator that the metabolites under consideration are strongly correlated with an ASD diagnosis but also that the statistical analysis used here offers tremendous potential for extracting important information from complex biochemical data sets. [ABSTRACT FROM AUTHOR]- Published
- 2017
- Full Text
- View/download PDF
35. Bayesian phylogeography of influenza A/H3N2 for the 2014-15 season in the United States using three frameworks of ancestral state reconstruction.
- Author
-
Magee, Daniel, Suchard, Marc A., and Scotch, Matthew
- Subjects
INFLUENZA A virus, H3N2 subtype ,BAYESIAN analysis ,PHYLOGEOGRAPHY ,PANDEMICS - Abstract
Ancestral state reconstruction in Bayesian phylogeography of virus pandemics have been improved by utilizing a Bayesian stochastic search variable selection (BSSVS) framework. Recently, this framework has been extended to model the transition rate matrix between discrete states as a generalized linear model (GLM) of genetic, geographic, demographic, and environmental predictors of interest to the virus and incorporating BSSVS to estimate the posterior inclusion probabilities of each predictor. Although the latter appears to enhance the biological validity of ancestral state reconstruction, there has yet to be a comparison of phylogenies created by the two methods. In this paper, we compare these two methods, while also using a primitive method without BSSVS, and highlight the differences in phylogenies created by each. We test six coalescent priors and six random sequence samples of H3N2 influenza during the 2014–15 flu season in the U.S. We show that the GLMs yield significantly greater root state posterior probabilities than the two alternative methods under five of the six priors, and significantly greater Kullback-Leibler divergence values than the two alternative methods under all priors. Furthermore, the GLMs strongly implicate temperature and precipitation as driving forces of this flu season and nearly unanimously identified a single root state, which exhibits the most tropical climate during a typical flu season in the U.S. The GLM, however, appears to be highly susceptible to sampling bias compared with the other methods, which casts doubt on whether its reconstructions should be favored over those created by alternate methods. We report that a BSSVS approach with a Poisson prior demonstrates less bias toward sample size under certain conditions than the GLMs or primitive models, and believe that the connection between reconstruction method and sampling bias warrants further investigation. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
36. Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model.
- Author
-
Wang, Sheng, Sun, Siqi, Li, Zhen, Zhang, Renyu, and Xu, Jinbo
- Subjects
PROTEIN structure ,ARTIFICIAL neural networks ,PROTEIN folding ,PAIRED comparisons (Mathematics) ,AMINO acid sequence - Abstract
Motivation: Protein contacts contain key information for the understanding of protein structure and function and thus, contact prediction from sequence is an important problem. Recently exciting progress has been made on this problem, but the predicted contacts for proteins without many sequence homologs is still of low quality and not very useful for de novo structure prediction. Method: This paper presents a new deep learning method that predicts contacts by integrating both evolutionary coupling (EC) and sequence conservation information through an ultra-deep neural network formed by two deep residual neural networks. The first residual network conducts a series of 1-dimensional convolutional transformation of sequential features; the second residual network conducts a series of 2-dimensional convolutional transformation of pairwise information including output of the first residual network, EC information and pairwise potential. By using very deep residual networks, we can accurately model contact occurrence patterns and complex sequence-structure relationship and thus, obtain high-quality contact prediction regardless of how many sequence homologs are available for proteins in question. Results: Our method greatly outperforms existing methods and leads to much more accurate contact-assisted folding. Tested on 105 CASP11 targets, 76 past CAMEO hard targets, and 398 membrane proteins, the average top L long-range prediction accuracy obtained by our method, one representative EC method CCMpred and the CASP11 winner MetaPSICOV is 0.47, 0.21 and 0.30, respectively; the average top L/10 long-range accuracy of our method, CCMpred and MetaPSICOV is 0.77, 0.47 and 0.59, respectively. Ab initio folding using our predicted contacts as restraints but without any force fields can yield correct folds (i.e., TMscore>0.6) for 203 of the 579 test proteins, while that using MetaPSICOV- and CCMpred-predicted contacts can do so for only 79 and 62 of them, respectively. Our contact-assisted models also have much better quality than template-based models especially for membrane proteins. The 3D models built from our contact prediction have TMscore>0.5 for 208 of the 398 membrane proteins, while those from homology modeling have TMscore>0.5 for only 10 of them. Further, even if trained mostly by soluble proteins, our deep learning method works very well on membrane proteins. In the recent blind CAMEO benchmark, our fully-automated web server implementing this method successfully folded 6 targets with a new fold and only 0.3L-2.3L effective sequence homologs, including one β protein of 182 residues, one α+β protein of 125 residues, one α protein of 140 residues, one α protein of 217 residues, one α/β of 260 residues and one α protein of 462 residues. Our method also achieved the highest F1 score on free-modeling targets in the latest CASP (Critical Assessment of Structure Prediction), although it was not fully implemented back then. Availability: [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
37. Stability of Cross-Feeding Polymorphisms in Microbial Communities.
- Author
-
Gudelj, Ivana, Kinnersley, Margie, Rashkov, Peter, Schmidt, Karen, and Rosenzweig, Frank
- Subjects
MICROBIAL metabolism ,SYNTROPHISM ,BIOLOGICAL evolution ,MICROBIAL metabolites ,METABOLISM in escherichia ,POPULATION differentiation - Abstract
Cross-feeding, a relationship wherein one organism consumes metabolites excreted by another, is a ubiquitous feature of natural and clinically-relevant microbial communities and could be a key factor promoting diversity in extreme and/or nutrient-poor environments. However, it remains unclear how readily cross-feeding interactions form, and therefore our ability to predict their emergence is limited. In this paper we developed a mathematical model parameterized using data from the biochemistry and ecology of an E. coli cross-feeding laboratory system. The model accurately captures short-term dynamics of the two competitors that have been observed empirically and we use it to systematically explore the stability of cross-feeding interactions for a range of environmental conditions. We find that our simple system can display complex dynamics including multi-stable behavior separated by a critical point. Therefore whether cross-feeding interactions form depends on the complex interplay between density and frequency of the competitors as well as on the concentration of resources in the environment. Moreover, we find that subtly different environmental conditions can lead to dramatically different results regarding the establishment of cross-feeding, which could explain the apparently unpredictable between-population differences in experimental outcomes. We argue that mathematical models are essential tools for disentangling the complexities of cross-feeding interactions. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
38. Inferring Aggregated Functional Traits from Metagenomic Data Using Constrained Non-negative Matrix Factorization: Application to Fiber Degradation in the Human Gut Microbiota.
- Author
-
Raguideau, Sébastien, Plancade, Sandra, Pons, Nicolas, Leclerc, Marion, and Laroche, Béatrice
- Subjects
NONNEGATIVE matrices ,ENTEROTYPES ,METAGENOMICS ,MICROBIAL ecology ,GENOMES - Abstract
Whole Genome Shotgun (WGS) metagenomics is increasingly used to study the structure and functions of complex microbial ecosystems, both from the taxonomic and functional point of view. Gene inventories of otherwise uncultured microbial communities make the direct functional profiling of microbial communities possible. The concept of community aggregated trait has been adapted from environmental and plant functional ecology to the framework of microbial ecology. Community aggregated traits are quantified from WGS data by computing the abundance of relevant marker genes. They can be used to study key processes at the ecosystem level and correlate environmental factors and ecosystem functions. In this paper we propose a novel model based approach to infer combinations of aggregated traits characterizing specific ecosystemic metabolic processes. We formulate a model of these Combined Aggregated Functional Traits (CAFTs) accounting for a hierarchical structure of genes, which are associated on microbial genomes, further linked at the ecosystem level by complex co-occurrences or interactions. The model is completed with constraints specifically designed to exploit available genomic information, in order to favor biologically relevant CAFTs. The CAFTs structure, as well as their intensity in the ecosystem, is obtained by solving a constrained Non-negative Matrix Factorization (NMF) problem. We developed a multicriteria selection procedure for the number of CAFTs. We illustrated our method on the modelling of ecosystemic functional traits of fiber degradation by the human gut microbiota. We used 1408 samples of gene abundances from several high-throughput sequencing projects and found that four CAFTs only were needed to represent the fiber degradation potential. This data reduction highlighted biologically consistent functional patterns while providing a high quality preservation of the original data. Our method is generic and can be applied to other metabolic processes in the gut or in other ecosystems. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
39. A Stochastic Model of the Yeast Cell Cycle Reveals Roles for Feedback Regulation in Limiting Cellular Variability.
- Author
-
Barik, Debashis, Ball, David A., Peccoud, Jean, and Tyson, John J.
- Subjects
CELL cycle ,CELL division ,CYCLIN-dependent kinases ,PROTEIN kinases ,CYCLINS - Abstract
The cell division cycle of eukaryotes is governed by a complex network of cyclin-dependent protein kinases (CDKs) and auxiliary proteins that govern CDK activities. The control system must function reliably in the context of molecular noise that is inevitable in tiny yeast cells, because mistakes in sequencing cell cycle events are detrimental or fatal to the cell or its progeny. To assess the effects of noise on cell cycle progression requires not only extensive, quantitative, experimental measurements of cellular heterogeneity but also comprehensive, accurate, mathematical models of stochastic fluctuations in the CDK control system. In this paper we provide a stochastic model of the budding yeast cell cycle that accurately accounts for the variable phenotypes of wild-type cells and more than 20 mutant yeast strains simulated in different growth conditions. We specifically tested the role of feedback regulations mediated by G1- and SG2M-phase cyclins to minimize the noise in cell cycle progression. Details of the model are informed and tested by quantitative measurements (by fluorescence in situ hybridization) of the joint distributions of mRNA populations in yeast cells. We use the model to predict the phenotypes of ~30 mutant yeast strains that have not yet been characterized experimentally. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
40. A Graph-Centric Approach for Metagenome-Guided Peptide and Protein Identification in Metaproteomics.
- Author
-
Tang, Haixu, Li, Sujun, and Ye, Yuzhen
- Subjects
PROTEIN expression ,PEPTIDES ,GENES ,MICROBIOLOGICAL chemistry ,METAGENOMICS - Abstract
Metaproteomic studies adopt the common bottom-up proteomics approach to investigate the protein composition and the dynamics of protein expression in microbial communities. When matched metagenomic and/or metatranscriptomic data of the microbial communities are available, metaproteomic data analyses often employ a metagenome-guided approach, in which complete or fragmental protein-coding genes are first directly predicted from metagenomic (and/or metatranscriptomic) sequences or from their assemblies, and the resulting protein sequences are then used as the reference database for peptide/protein identification from MS/MS spectra. This approach is often limited because protein coding genes predicted from metagenomes are incomplete and fragmental. In this paper, we present a graph-centric approach to improving metagenome-guided peptide and protein identification in metaproteomics. Our method exploits the de Bruijn graph structure reported by metagenome assembly algorithms to generate a comprehensive database of protein sequences encoded in the community. We tested our method using several public metaproteomic datasets with matched metagenomic and metatranscriptomic sequencing data acquired from complex microbial communities in a biological wastewater treatment plant. The results showed that many more peptides and proteins can be identified when assembly graphs were utilized, improving the characterization of the proteins expressed in the microbial communities. The additional proteins we identified contribute to the characterization of important pathways such as those involved in degradation of chemical hazards. Our tools are released as open-source software on github at . [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
41. In Silico Knockout Studies of Xenophagic Capturing of Salmonella.
- Author
-
Scheidel, Jennifer, Amstein, Leonie, Ackermann, Jörg, Dikic, Ivan, and Koch, Ina
- Subjects
SALMONELLA detection ,SALMONELLA typhimurium ,PATHOGENIC microorganisms ,EPITHELIAL cells ,PLANT vacuoles - Abstract
The degradation of cytosol-invading pathogens by autophagy, a process known as xenophagy, is an important mechanism of the innate immune system. Inside the host, Salmonella Typhimurium invades epithelial cells and resides within a specialized intracellular compartment, the Salmonella-containing vacuole. A fraction of these bacteria does not persist inside the vacuole and enters the host cytosol. Salmonella Typhimurium that invades the host cytosol becomes a target of the autophagy machinery for degradation. The xenophagy pathway has recently been discovered, and the exact molecular processes are not entirely characterized. Complete kinetic data for each molecular process is not available, so far. We developed a mathematical model of the xenophagy pathway to investigate this key defense mechanism. In this paper, we present a Petri net model of Salmonella xenophagy in epithelial cells. The model is based on functional information derived from literature data. It comprises the molecular mechanism of galectin-8-dependent and ubiquitin-dependent autophagy, including regulatory processes, like nutrient-dependent regulation of autophagy and TBK1-dependent activation of the autophagy receptor, OPTN. To model the activation of TBK1, we proposed a new mechanism of TBK1 activation, suggesting a spatial and temporal regulation of this process. Using standard Petri net analysis techniques, we found basic functional modules, which describe different pathways of the autophagic capture of Salmonella and reflect the basic dynamics of the system. To verify the model, we performed in silico knockout experiments. We introduced a new concept of knockout analysis to systematically compute and visualize the results, using an in silico knockout matrix. The results of the in silico knockout analyses were consistent with published experimental results and provide a basis for future investigations of the Salmonella xenophagy pathway. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
42. Computational Discovery of Putative Leads for Drug Repositioning through Drug-Target Interaction Prediction.
- Author
-
Coelho, Edgar D., Arrais, Joel P., and Oliveira, José Luís
- Subjects
BACTERIAL organelles ,ORGANISMS ,DRUG development ,MOLECULAR docking ,DRUGS - Abstract
De novo experimental drug discovery is an expensive and time-consuming task. It requires the identification of drug-target interactions (DTIs) towards targets of biological interest, either to inhibit or enhance a specific molecular function. Dedicated computational models for protein simulation and DTI prediction are crucial for speed and to reduce the costs associated with DTI identification. In this paper we present a computational pipeline that enables the discovery of putative leads for drug repositioning that can be applied to any microbial proteome, as long as the interactome of interest is at least partially known. Network metrics calculated for the interactome of the bacterial organism of interest were used to identify putative drug-targets. Then, a random forest classification model for DTI prediction was constructed using known DTI data from publicly available databases, resulting in an area under the ROC curve of 0.91 for classification of out-of-sampling data. A drug-target network was created by combining 3,081 unique ligands and the expected ten best drug targets. This network was used to predict new DTIs and to calculate the probability of the positive class, allowing the scoring of the predicted instances. Molecular docking experiments were performed on the best scoring DTI pairs and the results were compared with those of the same ligands with their original targets. The results obtained suggest that the proposed pipeline can be used in the identification of new leads for drug repositioning. The proposed classification model is available at . [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
43. An Integrative Approach to Computational Modelling of the Gene Regulatory Network Controlling Clostridium botulinum Type A1 Toxin Production.
- Author
-
Ihekwaba, Adaoha E. C., Peck, Michael W., Barker, Gary C., Walshaw, John, and Mura, Ivan
- Subjects
CLOSTRIDIUM botulinum ,BOTULINUM toxin ,BOTULISM ,GENE regulatory networks ,RISK assessment - Abstract
Clostridium botulinum produces botulinum neurotoxins (BoNTs), highly potent substances responsible for botulism. Currently, mathematical models of C. botulinum growth and toxigenesis are largely aimed at risk assessment and do not include explicit genetic information beyond group level but integrate many component processes, such as signalling, membrane permeability and metabolic activity. In this paper we present a scheme for modelling neurotoxin production in C. botulinum Group I type A1, based on the integration of diverse information coming from experimental results available in the literature. Experiments show that production of BoNTs depends on the growth-phase and is under the control of positive and negative regulatory elements at the intracellular level. Toxins are released as large protein complexes and are associated with non-toxic components. Here, we systematically review and integrate those regulatory elements previously described in the literature for C. botulinum Group I type A1 into a population dynamics model, to build the very first computational model of toxin production at the molecular level. We conduct a validation of our model against several items of published experimental data for different wild type and mutant strains of C. botulinum Group I type A1. The result of this process underscores the potential of mathematical modelling at the cellular level, as a means of creating opportunities in developing new strategies that could be used to prevent botulism; and potentially contribute to improved methods for the production of toxin that is used for therapeutics. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
44. Likelihood-Based Inference of B Cell Clonal Families.
- Author
-
Ralph, Duncan K. and IVMatsen, Frederick A.
- Subjects
B cells ,CLONE cells ,B cell receptors ,MARKOV processes - Abstract
The human immune system depends on a highly diverse collection of antibody-making B cells. B cell receptor sequence diversity is generated by a random recombination process called “rearrangement” forming progenitor B cells, then a Darwinian process of lineage diversification and selection called “affinity maturation.” The resulting receptors can be sequenced in high throughput for research and diagnostics. Such a collection of sequences contains a mixture of various lineages, each of which may be quite numerous, or may consist of only a single member. As a step to understanding the process and result of this diversification, one may wish to reconstruct lineage membership, i.e. to cluster sampled sequences according to which came from the same rearrangement events. We call this clustering problem “clonal family inference.” In this paper we describe and validate a likelihood-based framework for clonal family inference based on a multi-hidden Markov Model (multi-HMM) framework for B cell receptor sequences. We describe an agglomerative algorithm to find a maximum likelihood clustering, two approximate algorithms with various trade-offs of speed versus accuracy, and a third, fast algorithm for finding specific lineages. We show that under simulation these algorithms greatly improve upon existing clonal family inference methods, and that they also give significantly different clusters than previous methods when applied to two real data sets. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
45. Learning Reward Uncertainty in the Basal Ganglia.
- Author
-
Mikhael, John G. and Bogacz, Rafal
- Subjects
BASAL ganglia ,SYNAPSES ,NEURONS ,DOPAMINE ,NEUROPLASTICITY - Abstract
Learning the reliability of different sources of rewards is critical for making optimal choices. However, despite the existence of detailed theory describing how the expected reward is learned in the basal ganglia, it is not known how reward uncertainty is estimated in these circuits. This paper presents a class of models that encode both the mean reward and the spread of the rewards, the former in the difference between the synaptic weights of D1 and D2 neurons, and the latter in their sum. In the models, the tendency to seek (or avoid) options with variable reward can be controlled by increasing (or decreasing) the tonic level of dopamine. The models are consistent with the physiology of and synaptic plasticity in the basal ganglia, they explain the effects of dopaminergic manipulations on choices involving risks, and they make multiple experimental predictions. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
46. Quorum-Sensing Synchronization of Synthetic Toggle Switches: A Design Based on Monotone Dynamical Systems Theory.
- Author
-
Nikolaev, Evgeni V. and Sontag, Eduardo D.
- Subjects
QUORUM sensing ,CELL populations ,SYNTHETIC biology ,GENE expression ,PHYSICAL sciences ,FLOW cytometry - Abstract
Synthetic constructs in biotechnology, biocomputing, and modern gene therapy interventions are often based on plasmids or transfected circuits which implement some form of “on-off” switch. For example, the expression of a protein used for therapeutic purposes might be triggered by the recognition of a specific combination of inducers (e.g., antigens), and memory of this event should be maintained across a cell population until a specific stimulus commands a coordinated shut-off. The robustness of such a design is hampered by molecular (“intrinsic”) or environmental (“extrinsic”) noise, which may lead to spontaneous changes of state in a subset of the population and is reflected in the bimodality of protein expression, as measured for example using flow cytometry. In this context, a “majority-vote” correction circuit, which brings deviant cells back into the required state, is highly desirable, and quorum-sensing has been suggested as a way for cells to broadcast their states to the population as a whole so as to facilitate consensus. In this paper, we propose what we believe is the first such a design that has mathematically guaranteed properties of stability and auto-correction under certain conditions. Our approach is guided by concepts and theory from the field of “monotone” dynamical systems developed by M. Hirsch, H. Smith, and others. We benchmark our design by comparing it to an existing design which has been the subject of experimental and theoretical studies, illustrating its superiority in stability and self-correction of synchronization errors. Our stability analysis, based on dynamical systems theory, guarantees global convergence to steady states, ruling out unpredictable (“chaotic”) behaviors and even sustained oscillations in the limit of convergence. These results are valid no matter what are the values of parameters, and are based only on the wiring diagram. The theory is complemented by extensive computational bifurcation analysis, performed for a biochemically-detailed and biologically-relevant model that we developed. Another novel feature of our approach is that our theorems on exponential stability of steady states for homogeneous or mixed populations are valid independently of the number N of cells in the population, which is usually very large (N ≫ 1) and unknown. We prove that the exponential stability depends on relative proportions of each type of state only. While monotone systems theory has been used previously for systems biology analysis, the current work illustrates its power for synthetic biology design, and thus has wider significance well beyond the application to the important problem of coordination of toggle switches. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
47. Systems Approaches to the Eukaryotic Stress Response.
- Author
-
Vogel, Christine
- Subjects
PHYSIOLOGICAL stress ,EUKARYOTES ,PROTEIN expression ,PROJECT management ,MOTIVATION (Psychology) ,RESEARCH personnel - Abstract
The author discusses aspects of her career and system approaches to stress response in eukaryotes. Topics include protein expression associated to environmental stress, project management, and motivation for laboratory members. In addition, she also presents advice to beginning researchers and principal investigators (PIs).
- Published
- 2016
- Full Text
- View/download PDF
48. Simulation and Theory of Antibody Binding to Crowded Antigen-Covered Surfaces.
- Author
-
De Michele, Cristiano, De Los Rios, Paolo, Foffi, Giuseppe, and Piazza, Francesco
- Subjects
ANTIGENS ,IMMUNOGLOBULIN G ,SURFACE plasmon resonance ,HAPTENS ,IMMUNE system - Abstract
In this paper we introduce a fully flexible coarse-grained model of immunoglobulin G (IgG) antibodies parametrized directly on cryo-EM data and simulate the binding dynamics of many IgGs to antigens adsorbed on a surface at increasing densities. Moreover, we work out a theoretical model that allows to explain all the features observed in the simulations. Our combined computational and theoretical framework is in excellent agreement with surface-plasmon resonance data and allows us to establish a number of important results. (i) Internal flexibility is key to maximize bivalent binding, flexible IgGs being able to explore the surface with their second arm in search for an available hapten. This is made clear by the strongly reduced ability to bind with both arms displayed by artificial IgGs designed to rigidly keep a prescribed shape. (ii) The large size of IgGs is instrumental to keep neighboring molecules at a certain distance (surface repulsion), which essentially makes antigens within reach of the second Fab always unoccupied on average. (iii) One needs to account independently for the thermodynamic and geometric factors that regulate the binding equilibrium. The key geometrical parameters, besides excluded-volume repulsion, describe the screening of free haptens by neighboring bound antibodies. We prove that the thermodynamic parameters govern the low-antigen-concentration regime, while the surface screening and repulsion only affect the binding at high hapten densities. Importantly, we prove that screening effects are concealed in relative measures, such as the fraction of bivalently bound antibodies. Overall, our model provides a valuable, accurate theoretical paradigm beyond existing frameworks to interpret experimental profiles of antibodies binding to multi-valent surfaces of different sorts in many contexts. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
49. Dynamical Allocation of Cellular Resources as an Optimal Control Problem: Novel Insights into Microbial Growth Strategies.
- Author
-
Giordano, Nils, Mairet, Francis, Gouzé, Jean-Luc, Geiselmann, Johannes, and de Jong, Hidde
- Subjects
MICROBIAL physiology ,MICROBIAL growth ,MACROMOLECULAR dynamics ,PONTRYAGIN'S minimum principle ,ESCHERICHIA coli - Abstract
Microbial physiology exhibits growth laws that relate the macromolecular composition of the cell to the growth rate. Recent work has shown that these empirical regularities can be derived from coarse-grained models of resource allocation. While these studies focus on steady-state growth, such conditions are rarely found in natural habitats, where microorganisms are continually challenged by environmental fluctuations. The aim of this paper is to extend the study of microbial growth strategies to dynamical environments, using a self-replicator model. We formulate dynamical growth maximization as an optimal control problem that can be solved using Pontryagin’s Maximum Principle. We compare this theoretical gold standard with different possible implementations of growth control in bacterial cells. We find that simple control strategies enabling growth-rate maximization at steady state are suboptimal for transitions from one growth regime to another, for example when shifting bacterial cells to a medium supporting a higher growth rate. A near-optimal control strategy in dynamical conditions is shown to require information on several, rather than a single physiological variable. Interestingly, this strategy has structural analogies with the regulation of ribosomal protein synthesis by ppGpp in the enterobacterium Escherichia coli. It involves sensing a mismatch between precursor and ribosome concentrations, as well as the adjustment of ribosome synthesis in a switch-like manner. Our results show how the capability of regulatory systems to integrate information about several physiological variables is critical for optimizing growth in a changing environment. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
50. Neighborhood Regularized Logistic Matrix Factorization for Drug-Target Interaction Prediction.
- Author
-
Liu, Yong, Wu, Min, Miao, Chunyan, Zhao, Peilin, and Li, Xiao-Li
- Subjects
TARGETED drug delivery ,DRUG interactions ,FACTORIZATION ,DRUG design ,PREDICTION models - Abstract
In pharmaceutical sciences, a crucial step of the drug discovery process is the identification of drug-target interactions. However, only a small portion of the drug-target interactions have been experimentally validated, as the experimental validation is laborious and costly. To improve the drug discovery efficiency, there is a great need for the development of accurate computational approaches that can predict potential drug-target interactions to direct the experimental verification. In this paper, we propose a novel drug-target interaction prediction algorithm, namely neighborhood regularized logistic matrix factorization (NRLMF). Specifically, the proposed NRLMF method focuses on modeling the probability that a drug would interact with a target by logistic matrix factorization, where the properties of drugs and targets are represented by drug-specific and target-specific latent vectors, respectively. Moreover, NRLMF assigns higher importance levels to positive observations (i.e., the observed interacting drug-target pairs) than negative observations (i.e., the unknown pairs). Because the positive observations are already experimentally verified, they are usually more trustworthy. Furthermore, the local structure of the drug-target interaction data has also been exploited via neighborhood regularization to achieve better prediction accuracy. We conducted extensive experiments over four benchmark datasets, and NRLMF demonstrated its effectiveness compared with five state-of-the-art approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.