Database: Complementary Index / Journal: plos computational biology / Topic: mathematics - Searchworks@Jio Institute Digital Library Search Results

Showing total 666 results

Start Over Topic mathematics Journal plos computational biology Database Complementary Index

666 results

1. Generation of Binary Tree-Child phylogenetic networks.

Author: Cardona, Gabriel, Pons, Joan Carles, and Scornavacca, Celine
Subjects: BOTANY, PHYSICAL sciences, BINARY number system, LIFE sciences, PLANT anatomy, GRAPH theory
Abstract: Phylogenetic networks generalize phylogenetic trees by allowing the modelization of events of reticulate evolution. Among the different kinds of phylogenetic networks that have been proposed in the literature, the subclass of binary tree-child networks is one of the most studied ones. However, very little is known about the combinatorial structure of these networks. In this paper we address the problem of generating all possible binary tree-child (BTC) networks with a given number of leaves in an efficient way via reduction/augmentation operations that extend and generalize analogous operations for phylogenetic trees, and are biologically relevant. Since our solution is recursive, this also provides us with a recurrence relation giving an upper bound on the number of such networks. We also show how the operations introduced in this paper can be employed to extend the evolutive history of a set of sequences, represented by a BTC network, to include a new sequence. An implementation in python of the algorithms described in this paper, along with some computational experiments, can be downloaded from . [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

2. Even a good influenza forecasting model can benefit from internet-based nowcasts, but those benefits are limited.

Author: Osthus, Dave, Daughton, Ashlynn R., and Priedhorsky, Reid
Subjects: INFLUENZA, RESPIRATORY infections, PUBLIC health, MATHEMATICAL models of forecasting
Abstract: The ability to produce timely and accurate flu forecasts in the United States can significantly impact public health. Augmenting forecasts with internet data has shown promise for improving forecast accuracy and timeliness in controlled settings, but results in practice are less convincing, as models augmented with internet data have not consistently outperformed models without internet data. In this paper, we perform a controlled experiment, taking into account data backfill, to improve clarity on the benefits and limitations of augmenting an already good flu forecasting model with internet-based nowcasts. Our results show that a good flu forecasting model can benefit from the augmentation of internet-based nowcasts in practice for all considered public health-relevant forecasting targets. The degree of forecast improvement due to nowcasting, however, is uneven across forecasting targets, with short-term forecasting targets seeing the largest improvements and seasonal targets such as the peak timing and intensity seeing relatively marginal improvements. The uneven forecasting improvements across targets hold even when “perfect” nowcasts are used. These findings suggest that further improvements to flu forecasting, particularly seasonal targets, will need to derive from other, non-nowcasting approaches. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

3. A modeling study of budding yeast colony formation and its relationship to budding pattern and aging.

Author: Wang, Yanli, Lo, Wing-Cheong, and Chou, Ching-Shan
Subjects: YEAST fungi genetics, BUDDING (Zoology), ELECTRIC properties of cells, HAPLOIDY, DIPLOIDY
Abstract: Budding yeast, which undergoes polarized growth during budding and mating, has been a useful model system to study cell polarization. Bud sites are selected differently in haploid and diploid yeast cells: haploid cells bud in an axial manner, while diploid cells bud in a bipolar manner. While previous studies have been focused on the molecular details of the bud site selection and polarity establishment, not much is known about how different budding patterns give rise to different functions at the population level. In this paper, we develop a two-dimensional agent-based model to study budding yeast colonies with cell-type specific biological processes, such as budding, mating, mating type switch, consumption of nutrients, and cell death. The model demonstrates that the axial budding pattern enhances mating probability at an early stage and the bipolar budding pattern improves colony development under nutrient limitation. Our results suggest that the frequency of mating type switch might control the trade-off between diploidization and inbreeding. The effect of cellular aging is also studied through our model. Based on the simulations, colonies initiated by an aged haploid cell show declined mating probability at an early stage and recover as the rejuvenated offsprings become the majority. Colonies initiated with aged diploid cells do not show disadvantage in colony expansion possibly due to the fact that young cells contribute the most to colony expansion. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

4. Personalized glucose forecasting for type 2 diabetes using data assimilation.

Author: Albers, David J., Levine, Matthew, Gluckman, Bruce, Ginsberg, Henry, Hripcsak, George, and Mamykina, Lena
Subjects: BLOOD sugar monitoring, TYPE 2 diabetes, QUALITY of life, GLYCEMIC control, BAYESIAN analysis, GAUSSIAN processes
Abstract: Type 2 diabetes leads to premature death and reduced quality of life for 8% of Americans. Nutrition management is critical to maintaining glycemic control, yet it is difficult to achieve due to the high individual differences in glycemic response to nutrition. Anticipating glycemic impact of different meals can be challenging not only for individuals with diabetes, but also for expert diabetes educators. Personalized computational models that can accurately forecast an impact of a given meal on an individual’s blood glucose levels can serve as the engine for a new generation of decision support tools for individuals with diabetes. However, to be useful in practice, these computational engines need to generate accurate forecasts based on limited datasets consistent with typical self-monitoring practices of individuals with type 2 diabetes. This paper uses three forecasting machines: (i) data assimilation, a technique borrowed from atmospheric physics and engineering that uses Bayesian modeling to infuse data with human knowledge represented in a mechanistic model, to generate real-time, personalized, adaptable glucose forecasts; (ii) model averaging of data assimilation output; and (iii) dynamical Gaussian process model regression. The proposed data assimilation machine, the primary focus of the paper, uses a modified dual unscented Kalman filter to estimate states and parameters, personalizing the mechanistic models. Model selection is used to make a personalized model selection for the individual and their measurement characteristics. The data assimilation forecasts are empirically evaluated against actual postprandial glucose measurements captured by individuals with type 2 diabetes, and against predictions generated by experienced diabetes educators after reviewing a set of historical nutritional records and glucose measurements for the same individual. The evaluation suggests that the data assimilation forecasts compare well with specific glucose measurements and match or exceed in accuracy expert forecasts. We conclude by examining ways to present predictions as forecast-derived range quantities and evaluate the comparative advantages of these ranges. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

5. Rapid Prediction of Bacterial Heterotrophic Fluxomics Using Machine Learning and Constraint Programming.

Author: Wu, Stephen Gang, Wang, Yuxuan, Jiang, Wu, Oyetunde, Tolutola, Yao, Ruilian, Zhang, Xuehong, Shimizu, Kazuyuki, Tang, Yinjie J., and Bao, Forrest Sheng
Subjects: METABOLIC flux analysis, SUPPORT vector machines, CELL metabolism, MACHINE learning, STOICHIOMETRY
Abstract: 13C metabolic flux analysis (13C-MFA) has been widely used to measure in vivo enzyme reaction rates (i.e., metabolic flux) in microorganisms. Mining the relationship between environmental and genetic factors and metabolic fluxes hidden in existing fluxomic data will lead to predictive models that can significantly accelerate flux quantification. In this paper, we present a web-based platform MFlux () that predicts the bacterial central metabolism via machine learning, leveraging data from approximately 100 13C-MFA papers on heterotrophic bacterial metabolisms. Three machine learning methods, namely Support Vector Machine (SVM), k-Nearest Neighbors (k-NN), and Decision Tree, were employed to study the sophisticated relationship between influential factors and metabolic fluxes. We performed a grid search of the best parameter set for each algorithm and verified their performance through 10-fold cross validations. SVM yields the highest accuracy among all three algorithms. Further, we employed quadratic programming to adjust flux profiles to satisfy stoichiometric constraints. Multiple case studies have shown that MFlux can reasonably predict fluxomes as a function of bacterial species, substrate types, growth rate, oxygen conditions, and cultivation methods. Due to the interest of studying model organism under particular carbon sources, bias of fluxome in the dataset may limit the applicability of machine learning models. This problem can be resolved after more papers on 13C-MFA are published for non-model species. [ABSTRACT FROM AUTHOR]
Published: 2016
Full Text: View/download PDF

6. Enzyme sequestration by the substrate: An analysis in the deterministic and stochastic domains.

Author: Petrides, Andreas and Vinnicombe, Glenn
Subjects: PHOSPHORYLATION, PHOSPHATASES, KINASES, ENZYMES, SEQUESTRATION (Chemistry)
Abstract: This paper is concerned with the potential multistability of protein concentrations in the cell. That is, situations where one, or a family of, proteins may sit at one of two or more different steady state concentrations in otherwise identical cells, and in spite of them being in the same environment. For models of multisite protein phosphorylation for example, in the presence of excess substrate, it has been shown that the achievable number of stable steady states can increase linearly with the number of phosphosites available. In this paper, we analyse the consequences of adding enzyme docking to these and similar models, with the resultant sequestration of phosphatase and kinase by the fully unphosphorylated and by the fully phosphorylated substrates respectively. In the large molecule numbers limit, where deterministic analysis is applicable, we prove that there are always values for these rates of sequestration which, when exceeded, limit the extent of multistability. For the models considered here, these numbers are much smaller than the affinity of the enzymes to the substrate when it is in a modifiable state. As substrate enzyme-sequestration is increased, we further prove that the number of steady states will inevitably be reduced to one. For smaller molecule numbers a stochastic analysis is more appropriate, where multistability in the large molecule numbers limit can manifest itself as multimodality of the probability distribution; the system spending periods of time in the vicinity of one mode before jumping to another. Here, we find that substrate enzyme sequestration can induce bimodality even in systems where only a single steady state can exist at large numbers. To facilitate this analysis, we develop a weakly chained diagonally dominant M-matrix formulation of the Chemical Master Equation, allowing greater insights in the way particular mechanisms, like enzyme sequestration, can shape probability distributions and therefore exhibit different behaviour across different regimes. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

7. A simulation of the random and directed motion of dendritic cells in chemokine fields.

Author: Parr, Avery, Anderson, Nicholas R., and Hammer, Daniel A.
Subjects: DENDRITIC cells, CHEMOTAXIS, CHEMOKINE receptors, CELL receptors, ANTIGEN presenting cells, T cells, MOTION
Abstract: Dendritic cells (DCs) are the most effective professional antigen-presenting cell. They ferry antigen from the extremities to T cells and are essential for the initiation of an adaptive immune response. Despite interest in how DCs respond to chemical stimuli, there have been few attempts to model DC migration. In this paper, we simulate the motility of DCs by modeling the generation of forces by filopodia and a force balance on the cell. The direction of fliopodial extension is coupled to differential occupancy of cognate chemokine receptors across the cell. Our model simulates chemokinesis and chemotaxis in a variety of chemical and mechanical environments. Simulated DCs undergoing chemokinesis were measured to have a speed of 5.1 ± 0.07 μm·min-1 and a persistence time of 3.2 ± 0.46 min, consistent with experiment. Cells undergoing chemotaxis exhibited a stronger chemotactic response when exposed to lower average chemokine concentrations, also consistent with experiment. We predicted that when placed in two opposing gradients, cells will cluster in a line, which we call the “line of equistimulation;” this clustering has also been observed. We calculated the effect of varying gradient steepness on the line of equistimulation, with steeper gradients resulting in tighter clustering. Moreover, gradients are found to be most potent when cells are in a gradient of chemokine whose mean concentration is close to the binding of the Kd to the receptor, and least potent when the mean concentration is 0.1Kd. Comparing our simulations to experiment, we can give a quantitative measure of the strength of certain chemokines relative to others. Assigning the signal of CCL19 binding CCR7 a baseline strength of 1, we found CCL21 binding CCR7 had a strength of 0.28, and CXCL12 binding CXCR4 had a strength of 0.30. These differences emerge despite both chemokines having virtually the same Kd, suggesting a mechanism of signal amplification in DCs requiring further study. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

8. LOTUS: A single- and multitask machine learning algorithm for the prediction of cancer driver genes.

Author: Collier, Olivier, Stoven, Véronique, and Vert, Jean-Philippe
Subjects: CANCER genes, MACHINE learning, LEARNING strategies, P53 antioncogene, PROTEIN-protein interactions, COMPUTATIONAL biology, TUMOR suppressor genes
Abstract: Cancer driver genes, i.e., oncogenes and tumor suppressor genes, are involved in the acquisition of important functions in tumors, providing a selective growth advantage, allowing uncontrolled proliferation and avoiding apoptosis. It is therefore important to identify these driver genes, both for the fundamental understanding of cancer and to help finding new therapeutic targets or biomarkers. Although the most frequently mutated driver genes have been identified, it is believed that many more remain to be discovered, particularly for driver genes specific to some cancer types. In this paper, we propose a new computational method called LOTUS to predict new driver genes. LOTUS is a machine-learning based approach which allows to integrate various types of data in a versatile manner, including information about gene mutations and protein-protein interactions. In addition, LOTUS can predict cancer driver genes in a pan-cancer setting as well as for specific cancer types, using a multitask learning strategy to share information across cancer types. We empirically show that LOTUS outperforms five other state-of-the-art driver gene prediction methods, both in terms of intrinsic consistency and prediction accuracy, and provide predictions of new cancer genes across many cancer types. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

9. Optimizing spatial allocation of seasonal influenza vaccine under temporal constraints.

Author: Venkatramanan, Srinivasan, Chen, Jiangzhuo, Fadikar, Arindam, Gupta, Sandeep, Higdon, Dave, Lewis, Bryan, Marathe, Madhav, Mortveit, Henning, and Vullikanti, Anil
Subjects: SEASONAL influenza, INFLUENZA vaccines, FLU vaccine efficacy, HEALTH policy
Abstract: Prophylactic interventions such as vaccine allocation are some of the most effective public health policy planning tools. The supply of vaccines, however, is limited and an important challenge is to optimally allocate the vaccines to minimize epidemic impact. This resource allocation question (which we refer to as VID) has multiple dimensions: when, where, to whom, etc. Most of the existing literature in this topic deals with the latter (to whom), proposing policies that prioritize individuals by age and disease risk. However, since seasonal influenza spread has a typical spatial trend, and due to the temporal constraints enforced by the availability schedule, the when and where problems become equally, if not more, relevant. In this paper, we study the VID problem in the context of seasonal influenza spread in the United States. We develop a national scale metapopulation model for influenza that integrates both short and long distance human mobility, along with realistic data on vaccine uptake. We also design GA, a greedy algorithm for allocating the vaccine supply at the state level under temporal constraints and show that such a strategy improves over the current baseline of pro-rata allocation, and the improvement is more pronounced for higher vaccine efficacy and moderate flu season intensity. Further, the resulting strategy resembles a ring vaccination applied spatiallyacross the US. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

10. Identifying nonlinear dynamical systems via generative recurrent neural networks with applications to fMRI.

Author: Koppe, Georgia, Toutounji, Hazem, Kirsch, Peter, Lis, Stefanie, and Durstewitz, Daniel
Subjects: RECURRENT neural networks, NONLINEAR dynamical systems, LINEAR dynamical systems, FUNCTIONAL magnetic resonance imaging, DYNAMICAL systems
Abstract: A major tenet in theoretical neuroscience is that cognitive and behavioral processes are ultimately implemented in terms of the neural system dynamics. Accordingly, a major aim for the analysis of neurophysiological measurements should lie in the identification of the computational dynamics underlying task processing. Here we advance a state space model (SSM) based on generative piecewise-linear recurrent neural networks (PLRNN) to assess dynamics from neuroimaging data. In contrast to many other nonlinear time series models which have been proposed for reconstructing latent dynamics, our model is easily interpretable in neural terms, amenable to systematic dynamical systems analysis of the resulting set of equations, and can straightforwardly be transformed into an equivalent continuous-time dynamical system. The major contributions of this paper are the introduction of a new observation model suitable for functional magnetic resonance imaging (fMRI) coupled to the latent PLRNN, an efficient stepwise training procedure that forces the latent model to capture the ‘true’ underlying dynamics rather than just fitting (or predicting) the observations, and of an empirical measure based on the Kullback-Leibler divergence to evaluate from empirical time series how well this goal of approximating the underlying dynamics has been achieved. We validate and illustrate the power of our approach on simulated ‘ground-truth’ dynamical systems as well as on experimental fMRI time series, and demonstrate that the learnt dynamics harbors task-related nonlinear structure that a linear dynamical model fails to capture. Given that fMRI is one of the most common techniques for measuring brain activity non-invasively in human subjects, this approach may provide a novel step toward analyzing aberrant (nonlinear) dynamics for clinical assessment or neuroscientific research. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

11. Transient crosslinking kinetics optimize gene cluster interactions.

Author: Walker, Benjamin, Taylor, Dane, Lawrimore, Josh, Hult, Caitlin, Adalsteinsson, David, Bloom, Kerry, and Forest, M. Gregory
Subjects: GENE clusters, CHROMOSOME structure, COMPUTATIONAL biology, RIBOSOMAL DNA
Abstract: Our understanding of how chromosomes structurally organize and dynamically interact has been revolutionized through the lens of long-chain polymer physics. Major protein contributors to chromosome structure and dynamics are condensin and cohesin that stochastically generate loops within and between chains, and entrap proximal strands of sister chromatids. In this paper, we explore the ability of transient, protein-mediated, gene-gene crosslinks to induce clusters of genes, thereby dynamic architecture, within the highly repeated ribosomal DNA that comprises the nucleolus of budding yeast. We implement three approaches: live cell microscopy; computational modeling of the full genome during G1 in budding yeast, exploring four decades of timescales for transient crosslinks between 5kbp domains (genes) in the nucleolus on Chromosome XII; and, temporal network models with automated community (cluster) detection algorithms applied to the full range of 4D modeling datasets. The data analysis tools detect and track gene clusters, their size, number, persistence time, and their plasticity (deformation). Of biological significance, our analysis reveals an optimal mean crosslink lifetime that promotes pairwise and cluster gene interactions through “flexible” clustering. In this state, large gene clusters self-assemble yet frequently interact (merge and separate), marked by gene exchanges between clusters, which in turn maximizes global gene interactions in the nucleolus. This regime stands between two limiting cases each with far less global gene interactions: with shorter crosslink lifetimes, “rigid” clustering emerges with clusters that interact infrequently; with longer crosslink lifetimes, there is a dissolution of clusters. These observations are compared with imaging experiments on a normal yeast strain and two condensin-modified mutant cell strains. We apply the same image analysis pipeline to the experimental and simulated datasets, providing support for the modeling predictions. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

12. Primacy coding facilitates effective odor discrimination when receptor sensitivities are tuned.

Author: Zwicker, David
Subjects: ODORS, BINARY codes, COMPUTATIONAL biology, COMPUTATIONAL neuroscience, OLFACTORY receptors, SENSORY perception
Abstract: The olfactory system faces the difficult task of identifying an enormous variety of odors independent of their intensity. Primacy coding, where the odor identity is encoded by the receptor types that respond earliest, might provide a compact and informative representation that can be interpreted efficiently by the brain. In this paper, we analyze the information transmitted by a simple model of primacy coding using numerical simulations and statistical descriptions. We show that the encoded information depends strongly on the number of receptor types included in the primacy representation, but only weakly on the size of the receptor repertoire. The representation is independent of the odor intensity and the transmitted information is useful to perform typical olfactory tasks with close to experimentally measured performance. Interestingly, we find situations in which a smaller receptor repertoire is advantageous for discriminating odors. The model also suggests that overly sensitive receptor types could dominate the entire response and make the whole array useless, which allows us to predict how receptor arrays need to adapt to stay useful during environmental changes. Taken together, we show that primacy coding is more useful than simple binary and normalized coding, essentially because the sparsity of the odor representations is independent of the odor statistics, in contrast to the alternatives. Primacy coding thus provides an efficient odor representation that is independent of the odor intensity and might thus help to identify odors in the olfactory cortex. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

13. A Bayesian framework for the analysis of systems biology models of the brain.

Author: Russell-Buckland, Joshua, Barnes, Christopher P., and Tachtsidis, Ilias
Subjects: BAYESIAN analysis, BRAIN physiology, SYSTEMS biology, SENSITIVITY analysis, MODELS & modelmaking
Abstract: Systems biology models are used to understand complex biological and physiological systems. Interpretation of these models is an important part of developing this understanding. These models are often fit to experimental data in order to understand how the system has produced various phenomena or behaviour that are seen in the data. In this paper, we have outlined a framework that can be used to perform Bayesian analysis of complex systems biology models. In particular, we have focussed on analysing a systems biology of the brain using both simulated and measured data. By using a combination of sensitivity analysis and approximate Bayesian computation, we have shown that it is possible to obtain distributions of parameters that can better guard against misinterpretation of results, as compared to a maximum likelihood estimate based approach. This is done through analysis of simulated and experimental data. NIRS measurements were simulated using the same simulated systemic input data for the model in a ‘healthy’ and ‘impaired’ state. By analysing both of these datasets, we show that different parameter spaces can be distinguished and compared between different physiological states or conditions. Finally, we analyse experimental data using the new Bayesian framework and the previous maximum likelihood estimate approach, showing that the Bayesian approach provides a more complete understanding of the parameter space. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

14. Chemical features mining provides new descriptive structure-odor relationships.

Author: Licon, Carmen C., Bosc, Guillaume, Sabri, Mohammed, Mantel, Marylou, Fournel, Arnaud, Bushdid, Caroline, Golebiowski, Jerome, Robardet, Celine, Plantevit, Marc, Kaytoue, Mehdi, and Bensafi, Moustafa
Subjects: ODORS, COLOR vision, PREDICTION models, BIOLOGY, ALGORITHMS
Abstract: An important goal in researching the biology of olfaction is to link the perception of smells to the chemistry of odorants. In other words, why do some odorants smell like fruits and others like flowers? While the so-called stimulus-percept issue was resolved in the field of color vision some time ago, the relationship between the chemistry and psycho-biology of odors remains unclear up to the present day. Although a series of investigations have demonstrated that this relationship exists, the descriptive and explicative aspects of the proposed models that are currently in use require greater sophistication. One reason for this is that the algorithms of current models do not consistently consider the possibility that multiple chemical rules can describe a single quality despite the fact that this is the case in reality, whereby two very different molecules can evoke a similar odor. Moreover, the available datasets are often large and heterogeneous, thus rendering the generation of multiple rules without any use of a computational approach overly complex. We considered these two issues in the present paper. First, we built a new database containing 1689 odorants characterized by physicochemical properties and olfactory qualities. Second, we developed a computational method based on a subgroup discovery algorithm that discriminated perceptual qualities of smells on the basis of physicochemical properties. Third, we ran a series of experiments on 74 distinct olfactory qualities and showed that the generation and validation of rules linking chemistry to odor perception was possible. Taken together, our findings provide significant new insights into the relationship between stimulus and percept in olfaction. In addition, by automatically extracting new knowledge linking chemistry of odorants and psychology of smells, our results provide a new computational framework of analysis enabling scientists in the field to test original hypotheses using descriptive or predictive modeling. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

15. Model diagnostics and refinement for phylodynamic models.

Author: Lau, Max SY, Grenfell, Bryan T, Worby, Colin J, and Gibson, Gavin J
Subjects: EPIDEMIOLOGY, GENOMICS, PATHOGENIC microorganisms, BIOLOGICAL evolution, LIFE sciences, SUPERSPREADING events
Abstract: Phylodynamic modelling, which studies the joint dynamics of epidemiological and evolutionary processes, has made significant progress in recent years due to increasingly available genomic data and advances in statistical modelling. These advances have greatly improved our understanding of transmission dynamics of many important pathogens. Nevertheless, there remains a lack of effective, targetted diagnostic tools for systematically detecting model mis-specification. Development of such tools is essential for model criticism, refinement, and calibration. The idea of utilising latent residuals for model assessment has already been exploited in general spatio-temporal epidemiological settings. Specifically, by proposing appropriately designed non-centered, re-parameterizations of a given epidemiological process, one can construct latent residuals with known sampling distributions which can be used to quantify evidence of model mis-specification. In this paper, we extend this idea to formulate a novel model-diagnostic framework for phylodynamic models. Using simulated examples, we show that our framework may effectively detect a particular form of mis-specification in a phylodynamic model, particularly in the event of superspreading. We also exemplify our approach by applying the framework to a dataset describing a local foot-and-mouth (FMD) outbreak in the UK, eliciting strong evidence against the assumption of no within-host-diversity in the outbreak. We further demonstrate that our framework can facilitate model calibration in real-life scenarios, by proposing a within-host-diversity model which appears to offer a better fit to data than one that assumes no within-host-diversity of FMD virus. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

16. LMTRDA: Using logistic model tree to predict MiRNA-disease associations by fusing multi-source information of sequences and similarities.

Author: Wang, Lei, You, Zhu-Hong, Chen, Xing, Li, Yang-Ming, Dong, Ya-Nan, Li, Li-Ping, and Zheng, Kai
Subjects: LOGISTIC model (Demography), MICRORNA, MEDICAL genetics, RNA sequencing, PREDICTION models, BREAST tumors, NATURAL language processing, LYMPHOMA diagnosis
Abstract: Emerging evidence has shown microRNAs (miRNAs) play an important role in human disease research. Identifying potential association among them is significant for the development of pathology, diagnose and therapy. However, only a tiny portion of all miRNA-disease pairs in the current datasets are experimentally validated. This prompts the development of high-precision computational methods to predict real interaction pairs. In this paper, we propose a new model of Logistic Model Tree for predicting miRNA-Disease Association (LMTRDA) by fusing multi-source information including miRNA sequences, miRNA functional similarity, disease semantic similarity, and known miRNA-disease associations. In particular, we introduce miRNA sequence information and extract its features using natural language processing technique for the first time in the miRNA-disease prediction model. In the cross-validation experiment, LMTRDA obtained 90.51% prediction accuracy with 92.55% sensitivity at the AUC of 90.54% on the HMDD V3.0 dataset. To further evaluate the performance of LMTRDA, we compared it with different classifier and feature descriptor models. In addition, we also validate the predictive ability of LMTRDA in human diseases including Breast Neoplasms, Breast Neoplasms and Lymphoma. As a result, 28, 27 and 26 out of the top 30 miRNAs associated with these diseases were verified by experiments in different kinds of case studies. These experimental results demonstrate that LMTRDA is a reliable model for predicting the association among miRNAs and diseases. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

17. A numerical approach for a discrete Markov model for progressing drug resistance of cancer.

Author: Maeda, Masayuki and Yamashita, Hideaki
Subjects: MARKOV processes, DRUG resistance, CANCER treatment, COMPUTER simulation, PROBABILITY theory
Abstract: The presence of treatment-resistant cells is an important factor that limits the efficacy of cancer therapy, and the prospect of resistance is considered the major cause of the treatment strategy. Several recent studies have employed mathematical models to elucidate the dynamics of generating resistant cancer cells and attempted to predict the probability of emerging resistant cells. The purpose of this paper is to present numerical approach to compute the number of resistant cells and the emerging probability of resistance. Stochastic model was designed and developed a method to approximately but efficiently compute the number of resistant cells and the probability of resistance. To model the progression of cancer, a discrete-state, two-dimensional Markov process whose states are the total number of cells and the number of resistant cells was employed. Then exact analysis and approximate aggregation approaches were proposed to calculate the number of resistant cells and the probability of resistance when the cell population reaches detection size. To confirm the accuracy of computed results of approximation, relative errors between exact analysis and approximation were computed. The numerical values of our approximation method were very close to those of exact analysis calculated in the range of small detection size M = 500, 100, and 1500. Then computer simulation was performed to confirm the accuracy of computed results of approximation when the detection size was M = 10000,30000,50000,100000 and 1000000. All the numerical results of approximation fell between the upper level and the lower level of 95% confidential intervals and our method took less time to compute over a broad range of cell size. The effects of parameter change on emerging probabilities of resistance were also investigated by computed values using approximation method. The results showed that the number of divisions until the cell population reached the detection size is important for emerging the probability of resistance. The next step of numerical approach is to compute the emerging probabilities of resistance under drug administration and with multiple mutation. Another effective approximation would be necessary for the analysis of the latter case. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

18. A data-driven interactome of synergistic genes improves network-based cancer outcome prediction.

Author: Allahyar, Amin, Ubels, Joske, and de Ridder, Jeroen
Subjects: CANCER patients, GENE expression, CANCER treatment, HEALTH outcome assessment, MOLECULAR genetics
Abstract: Robustly predicting outcome for cancer patients from gene expression is an important challenge on the road to better personalized treatment. Network-based outcome predictors (NOPs), which considers the cellular wiring diagram in the classification, hold much promise to improve performance, stability and interpretability of identified marker genes. Problematically, reports on the efficacy of NOPs are conflicting and for instance suggest that utilizing random networks performs on par to networks that describe biologically relevant interactions. In this paper we turn the prediction problem around: instead of using a given biological network in the NOP, we aim to identify the network of genes that truly improves outcome prediction. To this end, we propose SyNet, a gene network constructed ab initio from synergistic gene pairs derived from survival-labelled gene expression data. To obtain SyNet, we evaluate synergy for all 69 million pairwise combinations of genes resulting in a network that is specific to the dataset and phenotype under study and can be used to in a NOP model. We evaluated SyNet and 11 other networks on a compendium dataset of >4000 survival-labelled breast cancer samples. For this purpose, we used cross-study validation which more closely emulates real world application of these outcome predictors. We find that SyNet is the only network that truly improves performance, stability and interpretability in several existing NOPs. We show that SyNet overlaps significantly with existing gene networks, and can be confidently predicted (~85% AUC) from graph-topological descriptions of these networks, in particular the breast tissue-specific network. Due to its data-driven nature, SyNet is not biased to well-studied genes and thus facilitates post-hoc interpretation. We find that SyNet is highly enriched for known breast cancer genes and genes related to e.g. histological grade and tamoxifen resistance, suggestive of a role in determining breast cancer outcome. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

19. On variational solutions for whole brain serial-section histology using a Sobolev prior in the computational anatomy random orbit model.

Author: Lee, Brian C., Tward, Daniel J., Mitra, Partha P., and Miller, Michael I.
Subjects: HISTOLOGICAL techniques, DIFFEOMORPHISMS, HISTOLOGY, BRAIN, MICE
Abstract: This paper presents a variational framework for dense diffeomorphic atlas-mapping onto high-throughput histology stacks at the 20 μm meso-scale. The observed sections are modelled as Gaussian random fields conditioned on a sequence of unknown section by section rigid motions and unknown diffeomorphic transformation of a three-dimensional atlas. To regularize over the high-dimensionality of our parameter space (which is a product space of the rigid motion dimensions and the diffeomorphism dimensions), the histology stacks are modelled as arising from a first order Sobolev space smoothness prior. We show that the joint maximum a-posteriori, penalized-likelihood estimator of our high dimensional parameter space emerges as a joint optimization interleaving rigid motion estimation for histology restacking and large deformation diffeomorphic metric mapping to atlas coordinates. We show that joint optimization in this parameter space solves the classical curvature non-identifiability of the histology stacking problem. The algorithms are demonstrated on a collection of whole-brain histological image stacks from the Mouse Brain Architecture Project. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

20. SFPEL-LPI: Sequence-based feature projection ensemble learning for predicting LncRNA-protein interactions.

Author: Zhang, Wen, Tang, Guifeng, Huang, Feng, Zhang, Xining, Yue, Xiang, and Wu, Wenjian
Subjects: RNA-protein interactions, GENETIC regulation, RNA interference, RNA splicing, ADENYLATION (Biochemistry)
Abstract: LncRNA-protein interactions play important roles in post-transcriptional gene regulation, poly-adenylation, splicing and translation. Identification of lncRNA-protein interactions helps to understand lncRNA-related activities. Existing computational methods utilize multiple lncRNA features or multiple protein features to predict lncRNA-protein interactions, but features are not available for all lncRNAs or proteins; most of existing methods are not capable of predicting interacting proteins (or lncRNAs) for new lncRNAs (or proteins), which don’t have known interactions. In this paper, we propose the sequence-based feature projection ensemble learning method, “SFPEL-LPI”, to predict lncRNA-protein interactions. First, SFPEL-LPI extracts lncRNA sequence-based features and protein sequence-based features. Second, SFPEL-LPI calculates multiple lncRNA-lncRNA similarities and protein-protein similarities by using lncRNA sequences, protein sequences and known lncRNA-protein interactions. Then, SFPEL-LPI combines multiple similarities and multiple features with a feature projection ensemble learning frame. In computational experiments, SFPEL-LPI accurately predicts lncRNA-protein associations and outperforms other state-of-the-art methods. More importantly, SFPEL-LPI can be applied to new lncRNAs (or proteins). The case studies demonstrate that our method can find out novel lncRNA-protein interactions, which are confirmed by literature. Finally, we construct a user-friendly web server, available at . [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

21. Bayesian adaptive dual control of deep brain stimulation in a computational model of Parkinson’s disease.

Author: Grado, Logan L., Johnson, Matthew D., and Netoff, Theoden I.
Subjects: BAYESIAN analysis, PROBABILITY theory, BRAIN stimulation, KINDLING (Neurology), TRANSCRANIAL magnetic stimulation
Abstract: In this paper, we present a novel Bayesian adaptive dual controller (ADC) for autonomously programming deep brain stimulation devices. We evaluated the Bayesian ADC’s performance in the context of reducing beta power in a computational model of Parkinson’s disease, in which it was tasked with finding the set of stimulation parameters which optimally reduced beta power as fast as possible. Here, the Bayesian ADC has dual goals: (a) to minimize beta power by exploiting the best parameters found so far, and (b) to explore the space to find better parameters, thus allowing for better control in the future. The Bayesian ADC is composed of two parts: an inner parameterized feedback stimulator and an outer parameter adjustment loop. The inner loop operates on a short time scale, delivering stimulus based upon the phase and power of the beta oscillation. The outer loop operates on a long time scale, observing the effects of the stimulation parameters and using Bayesian optimization to intelligently select new parameters to minimize the beta power. We show that the Bayesian ADC can efficiently optimize stimulation parameters, and is superior to other optimization algorithms. The Bayesian ADC provides a robust and general framework for tuning stimulation parameters, can be adapted to use any feedback signal, and is applicable across diseases and stimulator designs. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

22. Rearrangement moves on rooted phylogenetic networks.

Author: Gambette, Philippe, Van Iersel, Leo, Jones, Mark, Lafond, Manuel, Pardi, Fabio, and Scornavacca, Celine
Subjects: TREE physiology, PHYLOGENY, BAYESIAN analysis, HEURISTIC algorithms
Abstract: Phylogenetic tree reconstruction is usually done by local search heuristics that explore the space of the possible tree topologies via simple rearrangements of their structure. Tree rearrangement heuristics have been used in combination with practically all optimization criteria in use, from maximum likelihood and parsimony to distance-based principles, and in a Bayesian context. Their basic components are rearrangement moves that specify all possible ways of generating alternative phylogenies from a given one, and whose fundamental property is to be able to transform, by repeated application, any phylogeny into any other phylogeny. Despite their long tradition in tree-based phylogenetics, very little research has gone into studying similar rearrangement operations for phylogenetic network—that is, phylogenies explicitly representing scenarios that include reticulate events such as hybridization, horizontal gene transfer, population admixture, and recombination. To fill this gap, we propose “horizontal” moves that ensure that every network of a certain complexity can be reached from any other network of the same complexity, and “vertical” moves that ensure reachability between networks of different complexities. When applied to phylogenetic trees, our horizontal moves—named rNNI and rSPR—reduce to the best-known moves on rooted phylogenetic trees, nearest-neighbor interchange and rooted subtree pruning and regrafting. Besides a number of reachability results—separating the contributions of horizontal and vertical moves—we prove that rNNI moves are local versions of rSPR moves, and provide bounds on the sizes of the rNNI neighborhoods. The paper focuses on the most biologically meaningful versions of phylogenetic networks, where edges are oriented and reticulation events clearly identified. Moreover, our rearrangement moves are robust to the fact that networks with higher complexity usually allow a better fit with the data. Our goal is to provide a solid basis for practical phylogenetic network reconstruction. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

23. Dynamic compensation, parameter identifiability, and equivariances.

Author: Sontag, Eduardo D.
Subjects: BIOLOGICAL circuits, GLUCOSE, HOMEOSTASIS, SYSTEMS biology, CONTROL theory (Engineering)
Abstract: A recent paper by Karin et al. introduced a mathematical notion called dynamical compensation (DC) of biological circuits. DC was shown to play an important role in glucose homeostasis as well as other key physiological regulatory mechanisms. Karin et al. went on to provide a sufficient condition to test whether a given system has the DC property. Here, we show how DC is a reformulation of a well-known concept in systems biology, statistics, and control theory—that of parameter structural non-identifiability. Viewing DC as a parameter identification problem enables one to take advantage of powerful theoretical and computational tools to test a system for DC. We obtain as a special case the sufficient criterion discussed by Karin et al. We also draw connections to system equivalence and to the fold-change detection property. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

24. Efficient pedigree recording for fast population genetics simulation.

Author: Kelleher, Jerome, Thornton, Kevin R., Ashander, Jaime, and Ralph, Peter L.
Subjects: POPULATION genetics, EUKARYOTES, PHYLOGENY, GENOTYPES, ALGORITHMS
Abstract: In this paper we describe how to efficiently record the entire genetic history of a population in forwards-time, individual-based population genetics simulations with arbitrary breeding models, population structure and demography. This approach dramatically reduces the computational burden of tracking individual genomes by allowing us to simulate only those loci that may affect reproduction (those having non-neutral variants). The genetic history of the population is recorded as a succinct tree sequence as introduced in the software package msprime, on which neutral mutations can be quickly placed afterwards. Recording the results of each breeding event requires storage that grows linearly with time, but there is a great deal of redundancy in this information. We solve this storage problem by providing an algorithm to quickly ‘simplify’ a tree sequence by removing this irrelevant history for a given set of genomes. By periodically simplifying the history with respect to the extant population, we show that the total storage space required is modest and overall large efficiency gains can be made over classical forward-time simulations. We implement a general-purpose framework for recording and simplifying genealogical data, which can be used to make simulations of any population model more efficient. We modify two popular forwards-time simulation frameworks to use this new approach and observe efficiency gains in large, whole-genome simulations of one to two orders of magnitude. In addition to speed, our method for recording pedigrees has several advantages: (1) All marginal genealogies of the simulated individuals are recorded, rather than just genotypes. (2) A population of N individuals with M polymorphic sites can be stored in O(N log N + M) space, making it feasible to store a simulation’s entire final generation as well as its history. (3) A simulation can easily be initialized with a more efficient coalescent simulation of deep history. The software for recording and processing tree sequences is named tskit. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

25. Predicting B cell receptor substitution profiles using public repertoire data.

Author: Dhar, Amrit, Davidsen, Kristian, IVMatsen, Frederick A., and Minin, Vladimir N.
Subjects: B cell receptors, AMINO acids, GENETIC mutation, CLONING, GERMINAL centers, IMMUNOTECHNOLOGY
Abstract: B cells develop high affinity receptors during the course of affinity maturation, a cyclic process of mutation and selection. At the end of affinity maturation, a number of cells sharing the same ancestor (i.e. in the same “clonal family”) are released from the germinal center; their amino acid frequency profile reflects the allowed and disallowed substitutions at each position. These clonal-family-specific frequency profiles, called “substitution profiles”, are useful for studying the course of affinity maturation as well as for antibody engineering purposes. However, most often only a single sequence is recovered from each clonal family in a sequencing experiment, making it impossible to construct a clonal-family-specific substitution profile. Given the public release of many high-quality large B cell receptor datasets, one may ask whether it is possible to use such data in a prediction model for clonal-family-specific substitution profiles. In this paper, we present the method “Substitution Profiles Using Related Families” (SPURF), a penalized tensor regression framework that integrates information from a rich assemblage of datasets to predict the clonal-family-specific substitution profile for any single input sequence. Using this framework, we show that substitution profiles from similar clonal families can be leveraged together with simulated substitution profiles and germline gene sequence information to improve prediction. We fit this model on a large public dataset and validate the robustness of our approach on two external datasets. Furthermore, we provide a command-line tool in an open-source software package () implementing these ideas and providing easy prediction using our pre-fit models. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

26. Informational structures: A dynamical system approach for integrated information.

Author: Esteban, Francisco J., Galadí, Javier, Langa, José A., Portillo, José R., and Soler-Toscano, Fernando
Subjects: DATA structures, DYNAMICAL systems, ELECTRONIC data processing, INFORMATION measurement, GRAPH theory
Abstract: Integrated Information Theory (IIT) has become nowadays the most sensible general theory of consciousness. In addition to very important statements, it opens the door for an abstract (mathematical) formulation of the theory. Given a mechanism in a particular state, IIT identifies a conscious experience with a conceptual structure, an informational object which exists, is composed of identified parts, is informative, integrated and maximally irreducible. This paper introduces a space-time continuous version of the concept of integrated information. To this aim, a graph and a dynamical systems treatment is used to define, for a given mechanism in a state for which a dynamics is settled, an Informational Structure, which is associated to the global attractor at each time of the system. By definition, the informational structure determines all the past and future behavior of the system, possesses an informational nature and, moreover, enriches all the points of the phase space with cause-effect power by means of its associated Informational Field. A detailed description of its inner structure by invariants and connections between them allows to associate a transition probability matrix to each informational structure and to develop a measure for the level of integrated information of the system. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

27. A marginalized two-part Beta regression model for microbiome compositional data.

Author: Chai, Haitao, Jiang, Hongmei, Lin, Lu, and Liu, Lei
Subjects: MICROORGANISMS, HUMAN microbiota, REGRESSION analysis, PUBLIC health, METAGENOMICS
Abstract: In microbiome studies, an important goal is to detect differential abundance of microbes across clinical conditions and treatment options. However, the microbiome compositional data (quantified by relative abundance) are highly skewed, bounded in [0, 1), and often have many zeros. A two-part model is commonly used to separate zeros and positive values explicitly by two submodels: a logistic model for the probability of a specie being present in Part I, and a Beta regression model for the relative abundance conditional on the presence of the specie in Part II. However, the regression coefficients in Part II cannot provide a marginal (unconditional) interpretation of covariate effects on the microbial abundance, which is of great interest in many applications. In this paper, we propose a marginalized two-part Beta regression model which captures the zero-inflation and skewness of microbiome data and also allows investigators to examine covariate effects on the marginal (unconditional) mean. We demonstrate its practical performance using simulation studies and apply the model to a real metagenomic dataset on mouse skin microbiota. We find that under the proposed marginalized model, without loss in power, the likelihood ratio test performs better in controlling the type I error than those under conventional methods. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

28. A minimally invasive neurostimulation method for controlling abnormal synchronisation in the neuronal activity.

Author: Asllani, Malbor, Expert, Paul, and Carletti, Timoteo
Subjects: NEURAL stimulation, SYNCHRONIC order, NEURAL physiology, PARKINSON'S disease, CONTROL theory (Engineering)
Abstract: Many collective phenomena in Nature emerge from the -partial- synchronisation of the units comprising a system. In the case of the brain, this self-organised process allows groups of neurons to fire in highly intricate partially synchronised patterns and eventually lead to high level cognitive outputs and control over the human body. However, when the synchronisation patterns are altered and hypersynchronisation occurs, undesirable effects can occur. This is particularly striking and well documented in the case of epileptic seizures and tremors in neurodegenerative diseases such as Parkinson’s disease. In this paper, we propose an innovative, minimally invasive, control method that can effectively desynchronise misfiring brain regions and thus mitigate and even eliminate the symptoms of the diseases. The control strategy, grounded in the Hamiltonian control theory, is applied to ensembles of neurons modelled via the Kuramoto or the Stuart-Landau models and allows for heterogeneous coupling among the interacting unities. The theory has been complemented with dedicated numerical simulations performed using the small-world Newman-Watts network and the random Erdős-Rényi network. Finally the method has been compared with the gold-standard Proportional-Differential Feedback control technique. Our method is shown to achieve equivalent levels of desynchronisation using lesser control strength and/or fewer controllers, being thus minimally invasive. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

29. 3D morphology-based clustering and simulation of human pyramidal cell dendritic spines.

Author: Luengo-Sanchez, Sergio, Fernaud-Espinosa, Isabel, Bielza, Concha, Benavides-Piccione, Ruth, Larrañaga, Pedro, and DeFelipe, Javier
Subjects: DENDRITIC cells, PYRAMIDAL neurons, DENDRITES, NEURONS, CEREBRAL cortex, DENDRITIC spines, BRAIN mapping
Abstract: The dendritic spines of pyramidal neurons are the targets of most excitatory synapses in the cerebral cortex. They have a wide variety of morphologies, and their morphology appears to be critical from the functional point of view. To further characterize dendritic spine geometry, we used in this paper over 7,000 individually 3D reconstructed dendritic spines from human cortical pyramidal neurons to group dendritic spines using model-based clustering. This approach uncovered six separate groups of human dendritic spines. To better understand the differences between these groups, the discriminative characteristics of each group were identified as a set of rules. Model-based clustering was also useful for simulating accurate 3D virtual representations of spines that matched the morphological definitions of each cluster. This mathematical approach could provide a useful tool for theoretical predictions on the functional features of human pyramidal neurons based on the morphology of dendritic spines. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

30. The effect of cell geometry on polarization in budding yeast.

Author: Trogdon, Michael, Drawert, Brian, Gomez, Carlos, Banavar, Samhita P., Yi, Tau-Mu, Campàs, Otger, and Petzold, Linda R.
Subjects: SACCHAROMYCES cerevisiae, STEM cells, BIOLOGICAL evolution, GENETIC transcription, SYNTHETIC biology
Abstract: The localization (or polarization) of proteins on the membrane during the mating of budding yeast (Saccharomyces cerevisiae) is an important model system for understanding simple pattern formation within cells. While there are many existing mathematical models of polarization, for both budding and mating, there are still many aspects of this process that are not well understood. In this paper we set out to elucidate the effect that the geometry of the cell can have on the dynamics of certain models of polarization. Specifically, we look at several spatial stochastic models of Cdc42 polarization that have been adapted from published models, on a variety of tip-shaped geometries, to replicate the shape change that occurs during the growth of the mating projection. We show here that there is a complex interplay between the dynamics of polarization and the shape of the cell. Our results show that while models of polarization can generate a stable polarization cap, its localization at the tip of mating projections is unstable, with the polarization cap drifting away from the tip of the projection in a geometry dependent manner. We also compare predictions from our computational results to experiments that observe cells with projections of varying lengths, and track the stability of the polarization cap. Lastly, we examine one model of actin polarization and show that it is unlikely, at least for the models studied here, that actin dynamics and vesicle traffic are able to overcome this effect of geometry. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

31. Minimal model of interictal and ictal discharges “Epileptor-2”.

Author: Chizhov, Anton V., Zefirov, Artyom V., Amakhin, Dmitry V., Smirnova, Elena Yu., and Zaitsev, Aleksey V.
Subjects: INTERNEURONS, MEMBRANE potential, ACTION potentials, POTASSIUM, NEUROSCIENCES
Abstract: Seizures occur in a recurrent manner with intermittent states of interictal and ictal discharges (IIDs and IDs). The transitions to and from IDs are determined by a set of processes, including synaptic interaction and ionic dynamics. Although mathematical models of separate types of epileptic discharges have been developed, modeling the transitions between states remains a challenge. A simple generic mathematical model of seizure dynamics (Epileptor) has recently been proposed by Jirsa et al. (2014); however, it is formulated in terms of abstract variables. In this paper, a minimal population-type model of IIDs and IDs is proposed that is as simple to use as the Epileptor, but the suggested model attributes physical meaning to the variables. The model is expressed in ordinary differential equations for extracellular potassium and intracellular sodium concentrations, membrane potential, and short-term synaptic depression variables. A quadratic integrate-and-fire model driven by the population input current is used to reproduce spike trains in a representative neuron. In simulations, potassium accumulation governs the transition from the silent state to the state of an ID. Each ID is composed of clustered IID-like events. The sodium accumulates during discharge and activates the sodium-potassium pump, which terminates the ID by restoring the potassium gradient and thus polarizing the neuronal membranes. The whole-cell and cell-attached recordings of a 4-AP-based in vitro model of epilepsy confirmed the primary model assumptions and predictions. The mathematical analysis revealed that the IID-like events are large-amplitude stochastic oscillations, which in the case of ID generation are controlled by slow oscillations of ionic concentrations. The IDs originate in the conditions of elevated potassium concentrations in a bath solution via a saddle-node-on-invariant-circle-like bifurcation for a non-smooth dynamical system. By providing a minimal biophysical description of ionic dynamics and network interactions, the model may serve as a hierarchical base from a simple to more complex modeling of seizures. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

32. Simulations to benchmark time-varying connectivity methods for fMRI.

Author: Thompson, William Hedley, Richter, Craig Geoffrey, Plavén-Sigray, Pontus, and Fransson, Peter
Subjects: FUNCTIONAL magnetic resonance imaging, BRAIN imaging, SIMULATION methods & models, MULTIPLICATION, ANALYSIS of covariance
Abstract: There is a current interest in quantifying time-varying connectivity (TVC) based on neuroimaging data such as fMRI. Many methods have been proposed, and are being applied, revealing new insight into the brain’s dynamics. However, given that the ground truth for TVC in the brain is unknown, many concerns remain regarding the accuracy of proposed estimates. Since there exist many TVC methods it is difficult to assess differences in time-varying connectivity between studies. In this paper, we present tvc_benchmarker, which is a Python package containing four simulations to test TVC methods. Here, we evaluate five different methods that together represent a wide spectrum of current approaches to estimating TVC (sliding window, tapered sliding window, multiplication of temporal derivatives, spatial distance and jackknife correlation). These simulations were designed to test each method’s ability to track changes in covariance over time, which is a key property in TVC analysis. We found that all tested methods correlated positively with each other, but there were large differences in the strength of the correlations between methods. To facilitate comparisons with future TVC methods, we propose that the described simulations can act as benchmark tests for evaluation of methods. Using tvc_benchmarker researchers can easily add, compare and submit their own TVC methods to evaluate its performance. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

33. Correcting for batch effects in case-control microbiome studies.

Author: Gibbons, Sean M., Duvallet, Claire, and Alm, Eric J.
Subjects: CASE-control method, MICROARRAY technology, RNA, MICROBIAL genomics
Abstract: High-throughput data generation platforms, like mass-spectrometry, microarrays, and second-generation sequencing are susceptible to batch effects due to run-to-run variation in reagents, equipment, protocols, or personnel. Currently, batch correction methods are not commonly applied to microbiome sequencing datasets. In this paper, we compare different batch-correction methods applied to microbiome case-control studies. We introduce a model-free normalization procedure where features (i.e. bacterial taxa) in case samples are converted to percentiles of the equivalent features in control samples within a study prior to pooling data across studies. We look at how this percentile-normalization method compares to traditional meta-analysis methods for combining independent p-values and to limma and ComBat, widely used batch-correction models developed for RNA microarray data. Overall, we show that percentile-normalization is a simple, non-parametric approach for correcting batch effects and improving sensitivity in case-control meta-analyses. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

34. A multitask clustering approach for single-cell RNA-seq analysis in Recessive Dystrophic Epidermolysis Bullosa.

Author: Zhang, Huanan, Lee, Catherine A. A., Li, Zhuliu, Garbe, John R., Eide, Cindy R., Petegrosso, Raphael, Kuang, Rui, and Tolar, Jakub
Subjects: EPIDERMOLYSIS bullosa, RNA sequencing, FLOW cytometry, BIOMARKERS, GENE expression
Abstract: Single-cell RNA sequencing (scRNA-seq) has been widely applied to discover new cell types by detecting sub-populations in a heterogeneous group of cells. Since scRNA-seq experiments have lower read coverage/tag counts and introduce more technical biases compared to bulk RNA-seq experiments, the limited number of sampled cells combined with the experimental biases and other dataset specific variations presents a challenge to cross-dataset analysis and discovery of relevant biological variations across multiple cell populations. In this paper, we introduce a method of variance-driven multitask clustering of single-cell RNA-seq data (scVDMC) that utilizes multiple single-cell populations from biological replicates or different samples. scVDMC clusters single cells in multiple scRNA-seq experiments of similar cell types and markers but varying expression patterns such that the scRNA-seq data are better integrated than typical pooled analyses which only increase the sample size. By controlling the variance among the cell clusters within each dataset and across all the datasets, scVDMC detects cell sub-populations in each individual experiment with shared cell-type markers but varying cluster centers among all the experiments. Applied to two real scRNA-seq datasets with several replicates and one large-scale Drop-seq dataset on three patient samples, scVDMC more accurately detected cell populations and known cell markers than pooled clustering and other recently proposed scRNA-seq clustering methods. In the case study applied to in-house Recessive Dystrophic Epidermolysis Bullosa (RDEB) scRNA-seq data, scVDMC revealed several new cell types and unknown markers validated by flow cytometry. MATLAB/Octave code available at . [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

35. SARNAclust: Semi-automatic detection of RNA protein binding motifs from immunoprecipitation data.

Author: Dotu, Ivan, Adamson, Scott I., Coleman, Benjamin, Fournier, Cyril, Ricart-Altimiras, Emma, Eyras, Eduardo, and Chuang, Jeffrey H.
Subjects: IMMUNOPRECIPITATION, RNA-binding proteins, PROTEIN-protein interactions, NUCLEOTIDE sequence, RNA splicing
Abstract: RNA-protein binding is critical to gene regulation, controlling fundamental processes including splicing, translation, localization and stability, and aberrant RNA-protein interactions are known to play a role in a wide variety of diseases. However, molecular understanding of RNA-protein interactions remains limited; in particular, identification of RNA motifs that bind proteins has long been challenging, especially when such motifs depend on both sequence and structure. Moreover, although RNA binding proteins (RBPs) often contain more than one binding domain, algorithms capable of identifying more than one binding motif simultaneously have not been developed. In this paper we present a novel pipeline to determine binding peaks in crosslinking immunoprecipitation (CLIP) data, to discover multiple possible RNA sequence/structure motifs among them, and to experimentally validate such motifs. At the core is a new semi-automatic algorithm SARNAclust, the first unsupervised method to identify and deconvolve multiple sequence/structure motifs simultaneously. SARNAclust computes similarity between sequence/structure objects using a graph kernel, providing the ability to isolate the impact of specific features through the bulge graph formalism. Application of SARNAclust to synthetic data shows its capability of clustering 5 motifs at once with a V-measure value of over 0.95, while GraphClust achieves only a V-measure of 0.083 and RNAcontext cannot detect any of the motifs. When applied to existing eCLIP sets, SARNAclust finds known motifs for SLBP and HNRNPC and novel motifs for several other RBPs such as AGGF1, AKAP8L and ILF3. We demonstrate an experimental validation protocol, a targeted Bind-n-Seq-like high-throughput sequencing approach that relies on RNA inverse folding for oligo pool design, that can validate the components within the SLBP motif. Finally, we use this protocol to experimentally interrogate the SARNAclust motif predictions for protein ILF3. Our results support a newly identified partially double-stranded UUUUUGAGA motif similar to that known for the splicing factor HNRNPC. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

36. Genetic programming based models in plant tissue culture: An addendum to traditional statistical approach.

Author: Mridula, Meenu R., Nair, Ashalatha S., and Kumar, K. Satheesh
Subjects: NAPHTHALENE, ACETIC acid, CHARCOAL, CALLUS (Botany), PLANT roots
Abstract: In this paper, we compared the efficacy of observation based modeling approach using a genetic algorithm with the regular statistical analysis as an alternative methodology in plant research. Preliminary experimental data on in vitro rooting was taken for this study with an aim to understand the effect of charcoal and naphthalene acetic acid (NAA) on successful rooting and also to optimize the two variables for maximum result. Observation-based modelling, as well as traditional approach, could identify NAA as a critical factor in rooting of the plantlets under the experimental conditions employed. Symbolic regression analysis using the software deployed here optimised the treatments studied and was successful in identifying the complex non-linear interaction among the variables, with minimalistic preliminary data. The presence of charcoal in the culture medium has a significant impact on root generation by reducing basal callus mass formation. Such an approach is advantageous for establishing in vitro culture protocols as these models will have significant potential for saving time and expenditure in plant tissue culture laboratories, and it further reduces the need for specialised background. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

37. A phylogenetic method to perform genome-wide association studies in microbes that accounts for population structure and recombination.

Author: Collins, Caitlin and Didelot, Xavier
Subjects: PHYLOGENY, MICROORGANISMS, NEISSERIA meningitidis, PENICILLIN, DRUG resistance in bacteria
Abstract: Genome-Wide Association Studies (GWAS) in microbial organisms have the potential to vastly improve the way we understand, manage, and treat infectious diseases. Yet, microbial GWAS methods established thus far remain insufficiently able to capitalise on the growing wealth of bacterial and viral genetic sequence data. Facing clonal population structure and homologous recombination, existing GWAS methods struggle to achieve both the precision necessary to reject spurious findings and the power required to detect associations in microbes. In this paper, we introduce a novel phylogenetic approach that has been tailor-made for microbial GWAS, which is applicable to organisms ranging from purely clonal to frequently recombining, and to both binary and continuous phenotypes. Our approach is robust to the confounding effects of both population structure and recombination, while maintaining high statistical power to detect associations. Thorough testing via application to simulated data provides strong support for the power and specificity of our approach and demonstrates the advantages offered over alternative cluster-based and dimension-reduction methods. Two applications to Neisseria meningitidis illustrate the versatility and potential of our method, confirming previously-identified penicillin resistance loci and resulting in the identification of both well-characterised and novel drivers of invasive disease. Our method is implemented as an open-source R package called treeWAS which is freely available at . [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

38. What drives the perceptual change resulting from speech motor adaptation? Evaluation of hypotheses in a Bayesian modeling framework.

Author: Patri, Jean-François, Perrier, Pascal, Schwartz, Jean-Luc, and Diard, Julien
Subjects: MOTOR ability, MOTOR ability testing, SPEECH perception, HEARING, PERTURBATION theory
Abstract: Shifts in perceptual boundaries resulting from speech motor learning induced by perturbations of the auditory feedback were taken as evidence for the involvement of motor functions in auditory speech perception. Beyond this general statement, the precise mechanisms underlying this involvement are not yet fully understood. In this paper we propose a quantitative evaluation of some hypotheses concerning the motor and auditory updates that could result from motor learning, in the context of various assumptions about the roles of the auditory and somatosensory pathways in speech perception. This analysis was made possible thanks to the use of a Bayesian model that implements these hypotheses by expressing the relationships between speech production and speech perception in a joint probability distribution. The evaluation focuses on how the hypotheses can (1) predict the location of perceptual boundary shifts once the perturbation has been removed, (2) account for the magnitude of the compensation in presence of the perturbation, and (3) describe the correlation between these two behavioral characteristics. Experimental findings about changes in speech perception following adaptation to auditory feedback perturbations serve as reference. Simulations suggest that they are compatible with a framework in which motor adaptation updates both the auditory-motor internal model and the auditory characterization of the perturbed phoneme, and where perception involves both auditory and somatosensory pathways. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

39. Bayesian inference of phylogenetic networks from bi-allelic genetic markers.

Author: Zhu, Jiafan, Wen, Dingqiao, Yu, Yun, Meudt, Heidi M., and Nakhleh, Luay
Subjects: BAYESIAN analysis, PHYLOGENY, INFERENTIAL statistics, GENETIC markers in plants, PLANTAGINACEAE
Abstract: Phylogenetic networks are rooted, directed, acyclic graphs that model reticulate evolutionary histories. Recently, statistical methods were devised for inferring such networks from either gene tree estimates or the sequence alignments of multiple unlinked loci. Bi-allelic markers, most notably single nucleotide polymorphisms (SNPs) and amplified fragment length polymorphisms (AFLPs), provide a powerful source of genome-wide data. In a recent paper, a method called SNAPP was introduced for statistical inference of species trees from unlinked bi-allelic markers. The generative process assumed by the method combined both a model of evolution for the bi-allelic markers, as well as the multispecies coalescent. A novel component of the method was a polynomial-time algorithm for exact computation of the likelihood of a fixed species tree via integration over all possible gene trees for a given marker. Here we report on a method for Bayesian inference of phylogenetic networks from bi-allelic markers. Our method significantly extends the algorithm for exact computation of phylogenetic network likelihood via integration over all possible gene trees. Unlike the case of species trees, the algorithm is no longer polynomial-time on all instances of phylogenetic networks. Furthermore, the method utilizes a reversible-jump MCMC technique to sample the posterior of phylogenetic networks given bi-allelic marker data. Our method has a very good performance in terms of accuracy and robustness as we demonstrate on simulated data, as well as a data set of multiple New Zealand species of the plant genus Ourisia (Plantaginaceae). We implemented the method in the publicly available, open-source PhyloNet software package. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

40. Forecasting Human African Trypanosomiasis Prevalences from Population Screening Data Using Continuous Time Models.

Author: De Vries, Harwin, Wagelmans, Albert P. M., Hasker, Epco, Lumbala, Crispin, Lutumba, Pascal, De Vlas, Sake J., and Klundert, Joris Van De
Subjects: AFRICAN trypanosomiasis, MEDICAL screening, DISEASE prevalence, DISEASE progression, EPIDEMICS, DIAGNOSIS
Abstract: To eliminate and eradicate gambiense human African trypanosomiasis (HAT), maximizing the effectiveness of active case finding is of key importance. The progression of the epidemic is largely influenced by the planning of these operations. This paper introduces and analyzes five models for predicting HAT prevalence in a given village based on past observed prevalence levels and past screening activities in that village. Based on the quality of prevalence level predictions in 143 villages in Kwamouth (DRC), and based on the theoretical foundation underlying the models, we consider variants of the Logistic Model—a model inspired by the SIS epidemic model—to be most suitable for predicting HAT prevalence levels. Furthermore, we demonstrate the applicability of this model to predict the effects of planning policies for screening operations. Our analysis yields an analytical expression for the screening frequency required to reach eradication (zero prevalence) and a simple approach for determining the frequency required to reach elimination within a given time frame (one case per 10000). Furthermore, the model predictions suggest that annual screening is only expected to lead to eradication if at least half of the cases are detected during the screening rounds. This paper extends knowledge on control strategies for HAT and serves as a basis for further modeling and optimization studies. [ABSTRACT FROM AUTHOR]
Published: 2016
Full Text: View/download PDF

41. Fast Bayesian Inference of Copy Number Variants using Hidden Markov Models with Wavelet Compression.

Author: Wiedenhoeft, John, Brugel, Eric, and Schliep, Alexander
Subjects: MARKOV processes, WAVELETS (Mathematics), FORWARD-backward algorithm, CHROMOSOME fragments, BAYESIAN analysis
Abstract: By integrating Haar wavelets with Hidden Markov Models, we achieve drastically reduced running times for Bayesian inference using Forward-Backward Gibbs sampling. We show that this improves detection of genomic copy number variants (CNV) in array CGH experiments compared to the state-of-the-art, including standard Gibbs sampling. The method concentrates computational effort on chromosomal segments which are difficult to call, by dynamically and adaptively recomputing consecutive blocks of observations likely to share a copy number. This makes routine diagnostic use and re-analysis of legacy data collections feasible; to this end, we also propose an effective automatic prior. An open source software implementation of our method is available at (DOI: ). This paper was selected for oral presentation at RECOMB 2016, and an abstract is published in the conference proceedings. [ABSTRACT FROM AUTHOR]
Published: 2016
Full Text: View/download PDF

42. Bayes-optimal estimation of overlap between populations of fixed size.

Author: Larremore, Daniel B.
Subjects: POPULATION, TAXONOMY, ACQUISITION of data, STOCHASTIC processes, ECOLOGY, EPIDEMIOLOGY
Abstract: Measuring the overlap between two populations is, in principle, straightforward. Upon fully sampling both populations, the number of shared objects—species, taxonomical units, or gene variants, depending on the context—can be directly counted. In practice, however, only a fraction of each population’s objects are likely to be sampled due to stochastic data collection or sequencing techniques. Although methods exists for quantifying population overlap under subsampled conditions, their bias is well documented and the uncertainty of their estimates cannot be quantified. Here we derive and validate a method to rigorously estimate the population overlap from incomplete samples when the total number of objects, species, or genes in each population is known, a special case of the more general β-diversity problem that is particularly relevant in the ecology and genomic epidemiology of malaria. By solving a Bayesian inference problem, this method takes into account the rates of subsampling and produces unbiased and Bayes-optimal estimates of overlap. In addition, it provides a natural framework for computing the uncertainty of its estimates, and can be used prospectively in study planning by quantifying the tradeoff between sampling effort and uncertainty. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

43. Point process analysis of noise in early invertebrate vision.

Author: Parag, Kris V. and Vinnicombe, Glenn
Subjects: NOISE, INVERTEBRATES, PHOTONS, LIGHT intensity, G proteins
Abstract: Noise is a prevalent and sometimes even dominant aspect of many biological processes. While many natural systems have adapted to attenuate or even usefully integrate noise, the variability it introduces often still delimits the achievable precision across biological functions. This is particularly so for visual phototransduction, the process responsible for converting photons of light into usable electrical signals (quantum bumps). Here, randomness of both the photon inputs (regarded as extrinsic noise) and the conversion process (intrinsic noise) are seen as two distinct, independent and significant limitations on visual reliability. Past research has attempted to quantify the relative effects of these noise sources by using approximate methods that do not fully account for the discrete, point process and time ordered nature of the problem. As a result the conclusions drawn from these different approaches have led to inconsistent expositions of phototransduction noise performance. This paper provides a fresh and complete analysis of the relative impact of intrinsic and extrinsic noise in invertebrate phototransduction using minimum mean squared error reconstruction techniques based on Bayesian point process (Snyder) filters. An integrate-fire based algorithm is developed to reliably estimate photon times from quantum bumps and Snyder filters are then used to causally estimate random light intensities both at the front and back end of the phototransduction cascade. Comparison of these estimates reveals that the dominant noise source transitions from extrinsic to intrinsic as light intensity increases. By extending the filtering techniques to account for delays, it is further found that among the intrinsic noise components, which include bump latency (mean delay and jitter) and shape (amplitude and width) variance, it is the mean delay that is critical to noise performance. As the timeliness of visual information is important for real-time action, this delay could potentially limit the speed at which invertebrates can respond to stimuli. Consequently, if one wants to increase visual fidelity, reducing the photoconversion lag is much more important than improving the regularity of the electrical signal. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

44. Clusternomics: Integrative context-dependent clustering for heterogeneous datasets.

Author: Gabasova, Evelina, Reid, John, and Wernisch, Lorenz
Subjects: GENE expression, DNA copy number variations, MICRORNA, DNA methylation, PROTEOMICS
Abstract: Integrative clustering is used to identify groups of samples by jointly analysing multiple datasets describing the same set of biological samples, such as gene expression, copy number, methylation etc. Most existing algorithms for integrative clustering assume that there is a shared consistent set of clusters across all datasets, and most of the data samples follow this structure. However in practice, the structure across heterogeneous datasets can be more varied, with clusters being joined in some datasets and separated in others. In this paper, we present a probabilistic clustering method to identify groups across datasets that do not share the same cluster structure. The proposed algorithm, Clusternomics, identifies groups of samples that share their global behaviour across heterogeneous datasets. The algorithm models clusters on the level of individual datasets, while also extracting global structure that arises from the local cluster assignments. Clusters on both the local and the global level are modelled using a hierarchical Dirichlet mixture model to identify structure on both levels. We evaluated the model both on simulated and on real-world datasets. The simulated data exemplifies datasets with varying degrees of common structure. In such a setting Clusternomics outperforms existing algorithms for integrative and consensus clustering. In a real-world application, we used the algorithm for cancer subtyping, identifying subtypes of cancer from heterogeneous datasets. We applied the algorithm to TCGA breast cancer dataset, integrating gene expression, miRNA expression, DNA methylation and proteomics. The algorithm extracted clinically meaningful clusters with significantly different survival probabilities. We also evaluated the algorithm on lung and kidney cancer TCGA datasets with high dimensionality, again showing clinically significant results and scalability of the algorithm. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

45. PCSF: An R-package for network-based interpretation of high-throughput data.

Author: Akhmedov, Murodzhon, Kedaigle, Amanda, Chong, Renan Escalante, Montemanni, Roberto, Bertoni, Francesco, Fraenkel, Ernest, and Kwee, Ivo
Subjects: BIOINFORMATICS software, DATA analysis software, MATHEMATICAL optimization, COMPUTATIONAL biology, PROTEIN-protein interactions
Abstract: With the recent technological developments a vast amount of high-throughput data has been profiled to understand the mechanism of complex diseases. The current bioinformatics challenge is to interpret the data and underlying biology, where efficient algorithms for analyzing heterogeneous high-throughput data using biological networks are becoming increasingly valuable. In this paper, we propose a software package based on the Prize-collecting Steiner Forest graph optimization approach. The PCSF package performs fast and user-friendly network analysis of high-throughput data by mapping the data onto a biological networks such as protein-protein interaction, gene-gene interaction or any other correlation or coexpression based networks. Using the interaction networks as a template, it determines high-confidence subnetworks relevant to the data, which potentially leads to predictions of functional units. It also interactively visualizes the resulting subnetwork with functional enrichment analysis. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

46. Fast and general tests of genetic interaction for genome-wide association studies.

Author: Frånberg, Mattias, Strawbridge, Rona J., Hamsten, Anders, null, null, de Faire, Ulf, Lagergren, Jens, and Sennblad, Bengt
Subjects: GENOMES, DISEASES, CORONARY disease, PHYSICAL sciences, CARDIOLOGY
Abstract: A complex disease has, by definition, multiple genetic causes. In theory, these causes could be identified individually, but their identification will likely benefit from informed use of anticipated interactions between causes. In addition, characterizing and understanding interactions must be considered key to revealing the etiology of any complex disease. Large-scale collaborative efforts are now paving the way for comprehensive studies of interaction. As a consequence, there is a need for methods with a computational efficiency sufficient for modern data sets as well as for improvements of statistical accuracy and power. Another issue is that, currently, the relation between different methods for interaction inference is in many cases not transparent, complicating the comparison and interpretation of results between different interaction studies. In this paper we present computationally efficient tests of interaction for the complete family of generalized linear models (GLMs). The tests can be applied for inference of single or multiple interaction parameters, but we show, by simulation, that jointly testing the full set of interaction parameters yields superior power and control of false positive rate. Based on these tests we also describe how to combine results from multiple independent studies of interaction in a meta-analysis. We investigate the impact of several assumptions commonly made when modeling interactions. We also show that, across the important class of models with a full set of interaction parameters, jointly testing the interaction parameters yields identical results. Further, we apply our method to genetic data for cardiovascular disease. This allowed us to identify a putative interaction involved in Lp(a) plasma levels between two ‘tag’ variants in the LPA locus (p = 2.42 ⋅ 10−09) as well as replicate the interaction (p = 6.97 ⋅ 10−07). Finally, our meta-analysis method is used in a small (N = 16,181) study of interactions in myocardial infarction. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

47. ESPRIT-Forest: Parallel clustering of massive amplicon sequence data in subquadratic time.

Author: Cai, Yunpeng, Zheng, Wei, Yao, Jin, Yang, Yujie, Mai, Volker, Mao, Qi, and Sun, Yijun
Subjects: GENOMICS, QUADRATIC programming, HUMAN microbiota, RIBOSOMAL RNA, BIOACCUMULATION
Abstract: The rapid development of sequencing technology has led to an explosive accumulation of genomic sequence data. Clustering is often the first step to perform in sequence analy- sis, and hierarchical clustering is one of the most commonly used approaches for this purpose. However, it is currently computationally expensive to perform hierarchical clustering of extremely large sequence datasets due to its quadratic time and space complexities. In this paper we developed a new algorithm called ESPRIT-Forest for parallel hierarchical clustering of sequences. The algorithm achieves subquadratic time and space complexity and maintains a high clustering accuracy comparable to the standard method. The basic idea is to organize sequences into a pseudo-metric based partitioning tree for sub-linear time searching of nearest neighbors, and then use a new multiple-pair merging criterion to construct clusters in parallel using multiple threads. The new algorithm was tested on the human microbiome project (HMP) dataset, currently one of the largest published microbial 16S rRNA sequence dataset. Our experiment demonstrated that with the power of parallel computing it is now compu- tationally feasible to perform hierarchical clustering analysis of tens of millions of sequences. The software is available at . [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

48. Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model.

Author: Wang, Sheng, Sun, Siqi, Li, Zhen, Zhang, Renyu, and Xu, Jinbo
Subjects: PROTEIN structure, ARTIFICIAL neural networks, PROTEIN folding, PAIRED comparisons (Mathematics), AMINO acid sequence
Abstract: Motivation: Protein contacts contain key information for the understanding of protein structure and function and thus, contact prediction from sequence is an important problem. Recently exciting progress has been made on this problem, but the predicted contacts for proteins without many sequence homologs is still of low quality and not very useful for de novo structure prediction. Method: This paper presents a new deep learning method that predicts contacts by integrating both evolutionary coupling (EC) and sequence conservation information through an ultra-deep neural network formed by two deep residual neural networks. The first residual network conducts a series of 1-dimensional convolutional transformation of sequential features; the second residual network conducts a series of 2-dimensional convolutional transformation of pairwise information including output of the first residual network, EC information and pairwise potential. By using very deep residual networks, we can accurately model contact occurrence patterns and complex sequence-structure relationship and thus, obtain high-quality contact prediction regardless of how many sequence homologs are available for proteins in question. Results: Our method greatly outperforms existing methods and leads to much more accurate contact-assisted folding. Tested on 105 CASP11 targets, 76 past CAMEO hard targets, and 398 membrane proteins, the average top L long-range prediction accuracy obtained by our method, one representative EC method CCMpred and the CASP11 winner MetaPSICOV is 0.47, 0.21 and 0.30, respectively; the average top L/10 long-range accuracy of our method, CCMpred and MetaPSICOV is 0.77, 0.47 and 0.59, respectively. Ab initio folding using our predicted contacts as restraints but without any force fields can yield correct folds (i.e., TMscore>0.6) for 203 of the 579 test proteins, while that using MetaPSICOV- and CCMpred-predicted contacts can do so for only 79 and 62 of them, respectively. Our contact-assisted models also have much better quality than template-based models especially for membrane proteins. The 3D models built from our contact prediction have TMscore>0.5 for 208 of the 398 membrane proteins, while those from homology modeling have TMscore>0.5 for only 10 of them. Further, even if trained mostly by soluble proteins, our deep learning method works very well on membrane proteins. In the recent blind CAMEO benchmark, our fully-automated web server implementing this method successfully folded 6 targets with a new fold and only 0.3L-2.3L effective sequence homologs, including one β protein of 182 residues, one α+β protein of 125 residues, one α protein of 140 residues, one α protein of 217 residues, one α/β of 260 residues and one α protein of 462 residues. Our method also achieved the highest F1 score on free-modeling targets in the latest CASP (Critical Assessment of Structure Prediction), although it was not fully implemented back then. Availability: [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

49. A Scalable Computational Framework for Establishing Long-Term Behavior of Stochastic Reaction Networks.

Author: Gupta, Ankit, Briat, Corentin, and Khammash, Mustafa
Subjects: COMPUTATIONAL biology, STOCHASTIC processes, RANDOM variables, MATHEMATICAL optimization, INFORMATION networks
Abstract: Reaction networks are systems in which the populations of a finite number of species evolve through predefined interactions. Such networks are found as modeling tools in many biological disciplines such as biochemistry, ecology, epidemiology, immunology, systems biology and synthetic biology. It is now well-established that, for small population sizes, stochastic models for biochemical reaction networks are necessary to capture randomness in the interactions. The tools for analyzing such models, however, still lag far behind their deterministic counterparts. In this paper, we bridge this gap by developing a constructive framework for examining the long-term behavior and stability properties of the reaction dynamics in a stochastic setting. In particular, we address the problems of determining ergodicity of the reaction dynamics, which is analogous to having a globally attracting fixed point for deterministic dynamics. We also examine when the statistical moments of the underlying process remain bounded with time and when they converge to their steady state values. The framework we develop relies on a blend of ideas from probability theory, linear algebra and optimization theory. We demonstrate that the stability properties of a wide class of biological networks can be assessed from our sufficient theoretical conditions that can be recast as efficient and scalable linear programs, well-known for their tractability. It is notably shown that the computational complexity is often linear in the number of species. We illustrate the validity, the efficiency and the wide applicability of our results on several reaction networks arising in biochemistry, systems biology, epidemiology and ecology. The biological implications of the results as well as an example of a non-ergodic biological network are also discussed. [ABSTRACT FROM AUTHOR]
Published: 2014
Full Text: View/download PDF

50. Forward and Backward Inference in Spatial Cognition.

Author: Penny, Will D., Zeidman, Peter, and Burgess, Neil
Subjects: COMPUTATIONAL biology, DECISION making, HIPPOCAMPUS (Brain), ENTORHINAL cortex, CELLULAR signal transduction, AFFERENT pathways, PATH integrals
Abstract: This paper shows that the various computations underlying spatial cognition can be implemented using statistical inference in a single probabilistic model. Inference is implemented using a common set of ‘lower-level’ computations involving forward and backward inference over time. For example, to estimate where you are in a known environment, forward inference is used to optimally combine location estimates from path integration with those from sensory input. To decide which way to turn to reach a goal, forward inference is used to compute the likelihood of reaching that goal under each option. To work out which environment you are in, forward inference is used to compute the likelihood of sensory observations under the different hypotheses. For reaching sensory goals that require a chaining together of decisions, forward inference can be used to compute a state trajectory that will lead to that goal, and backward inference to refine the route and estimate control signals that produce the required trajectory. We propose that these computations are reflected in recent findings of pattern replay in the mammalian brain. Specifically, that theta sequences reflect decision making, theta flickering reflects model selection, and remote replay reflects route and motor planning. We also propose a mapping of the above computational processes onto lateral and medial entorhinal cortex and hippocampus. [ABSTRACT FROM AUTHOR]
Published: 2013
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Region

Database

666 results

Search Results

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources