199 results
Search Results
2. Bayes-optimal estimation of overlap between populations of fixed size
- Author
-
Larremore, Daniel B.
- Subjects
0301 basic medicine ,Plasmodium ,Epidemiology ,Computer science ,Artificial Gene Amplification and Extension ,Polymerase Chain Reaction ,Bayes' theorem ,0302 clinical medicine ,Statistics ,Medicine and Health Sciences ,lcsh:QH301-705.5 ,Protozoans ,education.field_of_study ,Ecology ,Malarial Parasites ,Uncertainty ,Eukaryota ,Sampling (statistics) ,Population ecology ,Computational Theory and Mathematics ,Modeling and Simulation ,Physical Sciences ,Probability distribution ,Research Article ,Disease Ecology ,Plasmodium falciparum ,Population ,Research and Analysis Methods ,Bayesian inference ,03 medical and health sciences ,Cellular and Molecular Neuroscience ,Parasite Groups ,Parasitic Diseases ,Genetics ,Fraction (mathematics) ,Molecular Biology Techniques ,education ,Molecular Biology ,Ecology, Evolution, Behavior and Systematics ,Population Density ,Stochastic Processes ,Population Biology ,Optimal estimation ,Ecology and Environmental Sciences ,Organisms ,Biology and Life Sciences ,Reproducibility of Results ,Bayes Theorem ,Probability Theory ,Probability Distribution ,Tropical Diseases ,Parasitic Protozoans ,Malaria ,030104 developmental biology ,lcsh:Biology (General) ,Parasitology ,Population Ecology ,Apicomplexa ,Mathematics ,030217 neurology & neurosurgery - Abstract
Measuring the overlap between two populations is, in principle, straightforward. Upon fully sampling both populations, the number of shared objects—species, taxonomical units, or gene variants, depending on the context—can be directly counted. In practice, however, only a fraction of each population’s objects are likely to be sampled due to stochastic data collection or sequencing techniques. Although methods exists for quantifying population overlap under subsampled conditions, their bias is well documented and the uncertainty of their estimates cannot be quantified. Here we derive and validate a method to rigorously estimate the population overlap from incomplete samples when the total number of objects, species, or genes in each population is known, a special case of the more general β-diversity problem that is particularly relevant in the ecology and genomic epidemiology of malaria. By solving a Bayesian inference problem, this method takes into account the rates of subsampling and produces unbiased and Bayes-optimal estimates of overlap. In addition, it provides a natural framework for computing the uncertainty of its estimates, and can be used prospectively in study planning by quantifying the tradeoff between sampling effort and uncertainty., Author summary Understanding when two populations are composed of similar species is important for ecologists, epidemiologists, and population geneticists, and in principle it is easy: just sample the two populations, compare the sets of species identified in each, and count how many appear in both populations. In practice, however, this is difficult because sampling methods typically produce only a random subset of the total population, leaving current population overlap estimates biased. Knowing only the number of shared members between two of these partial population samples, this paper shows how we can nevertheless estimate the true overlap between the full populations, when those full populations’ sizes are known. Using Bayesian statistics, we can also compute credible intervals to produce error bars. We show that using this unbiased approach has a dramatic impact on the conclusions one might draw from previously published studies in the malaria literature, which used simple but biased methods. Because the method in this paper quantifies the tradeoff between sampling effort and uncertainty, we also show how to compute the number of samples required to ensure high-confidence results, which may be useful for planning future studies or budgeting lab reagents and time.
- Published
- 2019
3. A numerical approach for a discrete Markov model for progressing drug resistance of cancer
- Author
-
Masayuki Maeda and Hideaki Yamashita
- Subjects
0301 basic medicine ,Stochastic modelling ,Cancer Treatment ,0302 clinical medicine ,Neoplasms ,Range (statistics) ,Medicine and Health Sciences ,Cell Cycle and Cell Division ,lcsh:QH301-705.5 ,Mathematics ,education.field_of_study ,Ecology ,Mathematical model ,Approximation Methods ,Pharmaceutics ,Markov Chains ,Computational Theory and Mathematics ,Oncology ,Cell Processes ,Modeling and Simulation ,Physical Sciences ,symbols ,Probability distribution ,Research Article ,Computer Modeling ,Computer and Information Sciences ,Markov Models ,Biochemical Phenomena ,Population ,Markov process ,Markov model ,03 medical and health sciences ,Cellular and Molecular Neuroscience ,symbols.namesake ,Drug Therapy ,Genetics ,Applied mathematics ,Point Mutation ,Humans ,Computer Simulation ,education ,Molecular Biology ,Ecology, Evolution, Behavior and Systematics ,Probability ,Models, Statistical ,Markov chain ,Models, Genetic ,Biology and Life Sciences ,Cell Biology ,Models, Theoretical ,Probability Theory ,Probability Distribution ,030104 developmental biology ,lcsh:Biology (General) ,Drug Resistance, Neoplasm ,Mutation ,030217 neurology & neurosurgery - Abstract
The presence of treatment-resistant cells is an important factor that limits the efficacy of cancer therapy, and the prospect of resistance is considered the major cause of the treatment strategy. Several recent studies have employed mathematical models to elucidate the dynamics of generating resistant cancer cells and attempted to predict the probability of emerging resistant cells. The purpose of this paper is to present numerical approach to compute the number of resistant cells and the emerging probability of resistance. Stochastic model was designed and developed a method to approximately but efficiently compute the number of resistant cells and the probability of resistance. To model the progression of cancer, a discrete-state, two-dimensional Markov process whose states are the total number of cells and the number of resistant cells was employed. Then exact analysis and approximate aggregation approaches were proposed to calculate the number of resistant cells and the probability of resistance when the cell population reaches detection size. To confirm the accuracy of computed results of approximation, relative errors between exact analysis and approximation were computed. The numerical values of our approximation method were very close to those of exact analysis calculated in the range of small detection size M = 500, 100, and 1500. Then computer simulation was performed to confirm the accuracy of computed results of approximation when the detection size was M = 10000,30000,50000,100000 and 1000000. All the numerical results of approximation fell between the upper level and the lower level of 95% confidential intervals and our method took less time to compute over a broad range of cell size. The effects of parameter change on emerging probabilities of resistance were also investigated by computed values using approximation method. The results showed that the number of divisions until the cell population reached the detection size is important for emerging the probability of resistance. The next step of numerical approach is to compute the emerging probabilities of resistance under drug administration and with multiple mutation. Another effective approximation would be necessary for the analysis of the latter case., Author summary Drug therapies for cancer have dramatically succeeded since molecular-targeted drugs have been introduced in medical practice; however, drug treatment often fails owing to the emergence of drug-resistant cells. A variety of approaches, including mathematical modeling, has been undertaken to clarify the mechanism of resistance and subsequently avoid resistance to therapy. This paper proposes one of the mathematical approaches that uses a stochastic model and provides the emerging probabilities of resistance at detection size.
- Published
- 2019
4. Informational structures: A dynamical system approach for integrated information
- Author
-
Fernando Soler-Toscano, Javier A. Galadí, José A. Langa, José R. Portillo, and Francisco J. Esteban
- Subjects
0301 basic medicine ,Theoretical computer science ,Computer science ,Information Theory ,Information theory ,Systems Science ,0302 clinical medicine ,Attractor ,lcsh:QH301-705.5 ,Ecology ,Brain ,Theories of Consciousness ,Dynamical Systems ,Computational Theory and Mathematics ,Modeling and Simulation ,Physical Sciences ,Graph (abstract data type) ,Algorithms ,Research Article ,Computer and Information Sciences ,Dynamical systems theory ,Consciousness ,Cognitive Neuroscience ,Models, Neurological ,03 medical and health sciences ,Cellular and Molecular Neuroscience ,Differential Equations ,Genetics ,Animals ,Humans ,Dynamical system (definition) ,Molecular Biology ,Ecology, Evolution, Behavior and Systematics ,Structure (mathematical logic) ,Integrated information theory ,Biology and Life Sciences ,Eigenvalues ,Models, Theoretical ,Object (computer science) ,Probability Theory ,Probability Distribution ,030104 developmental biology ,Algebra ,lcsh:Biology (General) ,Linear Algebra ,Nonlinear Dynamics ,Cognitive Science ,Eigenvectors ,030217 neurology & neurosurgery ,Mathematics ,Neuroscience - Abstract
Integrated Information Theory (IIT) has become nowadays the most sensible general theory of consciousness. In addition to very important statements, it opens the door for an abstract (mathematical) formulation of the theory. Given a mechanism in a particular state, IIT identifies a conscious experience with a conceptual structure, an informational object which exists, is composed of identified parts, is informative, integrated and maximally irreducible. This paper introduces a space-time continuous version of the concept of integrated information. To this aim, a graph and a dynamical systems treatment is used to define, for a given mechanism in a state for which a dynamics is settled, an Informational Structure, which is associated to the global attractor at each time of the system. By definition, the informational structure determines all the past and future behavior of the system, possesses an informational nature and, moreover, enriches all the points of the phase space with cause-effect power by means of its associated Informational Field. A detailed description of its inner structure by invariants and connections between them allows to associate a transition probability matrix to each informational structure and to develop a measure for the level of integrated information of the system., Author summary In this paper we introduce a space-time continuous version for the level of integrated information of a network on which a dynamics is defined. The concept of integrated information comes from the IIT of consciousness. By a strict mathematical formulation, we complement the existing IIT theoretical framework from a dynamical systems perspective. In other words, we develop the bases for a continuous mathematical approach to IIT introducing a dynamical system as the driving rule of a given mechanism. We also introduce and define the concepts of Informational Structure and Informational Field as the complex network with the power to ascertain the dynamics (past and future scenarios) of the studied phenomena. The detailed description of an informational structure is showing the cause-effect power of a mechanism in a state and thus, a characterization of the quantity and quality of information, and the way this is integrated. We firstly introduce how network patterns arise from dynamic phenomena on networks, leading to the concept of informational structure. Then, we formally introduce the mathematical objects supporting the theory, from graphs to informational structures, throughout the integration of dynamics on graphs with a global model of differential equations. After this, we formally present some of the IIT’s postulates associated to a given mechanism. Finally, we provide the quantitative and qualitative characterization of the integrated information, and how it depends on the geometry of the mechanism.
- Published
- 2018
5. Enzyme sequestration by the substrate: An analysis in the deterministic and stochastic domains
- Author
-
Petrides, Andreas, Vinnicombe, Glenn, Vinnicombe, Glenn [0000-0002-0622-4932], and Apollo - University of Cambridge Repository
- Subjects
Computer and Information Sciences ,Research and Analysis Methods ,Biochemistry ,Polynomials ,Systems Science ,Substrate Specificity ,Protein Domains ,Post-Translational Modification ,Phosphorylation ,lcsh:QH301-705.5 ,Stochastic Processes ,Applied Mathematics ,Simulation and Modeling ,Phosphotransferases ,Phosphatases ,Biology and Life Sciences ,Proteins ,Eigenvalues ,Probability Theory ,Probability Distribution ,Enzymes ,Molecular Docking Simulation ,Algebra ,lcsh:Biology (General) ,Linear Algebra ,Physical Sciences ,Enzymology ,Mathematics ,Algorithms ,Research Article ,Dwell Time ,Protein Binding - Abstract
This paper is concerned with the potential multistability of protein concentrations in the cell. That is, situations where one, or a family of, proteins may sit at one of two or more different steady state concentrations in otherwise identical cells, and in spite of them being in the same environment. For models of multisite protein phosphorylation for example, in the presence of excess substrate, it has been shown that the achievable number of stable steady states can increase linearly with the number of phosphosites available. In this paper, we analyse the consequences of adding enzyme docking to these and similar models, with the resultant sequestration of phosphatase and kinase by the fully unphosphorylated and by the fully phosphorylated substrates respectively. In the large molecule numbers limit, where deterministic analysis is applicable, we prove that there are always values for these rates of sequestration which, when exceeded, limit the extent of multistability. For the models considered here, these numbers are much smaller than the affinity of the enzymes to the substrate when it is in a modifiable state. As substrate enzyme-sequestration is increased, we further prove that the number of steady states will inevitably be reduced to one. For smaller molecule numbers a stochastic analysis is more appropriate, where multistability in the large molecule numbers limit can manifest itself as multimodality of the probability distribution; the system spending periods of time in the vicinity of one mode before jumping to another. Here, we find that substrate enzyme sequestration can induce bimodality even in systems where only a single steady state can exist at large numbers. To facilitate this analysis, we develop a weakly chained diagonally dominant M-matrix formulation of the Chemical Master Equation, allowing greater insights in the way particular mechanisms, like enzyme sequestration, can shape probability distributions and therefore exhibit different behaviour across different regimes., Author summary Models of multisite protein phosphorylation have been of great interest to the systems biology community, largely due to their ability to exhibit multistable behaviour. In the presence of excess substrate it has been shown that the number of stable steady states achieved can increase linearly with the number of phosphosites available. In this paper, we provide a quantitative mathematical analysis of the effect that enzyme docking, and the consequent phosphatase and kinase sequestration by the unphosphorylated and the fully phosphorylated substrates respectively, has on a multisite protein phosphorylation system. The analysis is done in both the deterministic and the stochastic domains, for large and small molecule numbers respectively. We prove, by finding sufficient conditions, that in the deterministic domain substrate enzyme-sequestration must inevitably limit the extent of multistability, ultimately to one steady state, even for systems with arbitrary processivity or sequentiality (i.e. where multiple phosphorylations or dephosphorylations can happen per reaction and in any order). In contrast, in the stochastic domain it can provide bimodality even in cases where bistability is not possible for large molecule numbers.
- Published
- 2018
6. Potassium and sodium microdomains in thin astroglial processes: A computational model study
- Author
-
Jim Harkin, John Wade, Liam McDaid, Harm van Zalinge, Bronac Flanagan, Alexei Verkhratsky, Kevin Breslin, Matthew C. Walker, Steve Hall, KongFatt Wong-Lin, and Jolivet, Renaud Blaise
- Subjects
0301 basic medicine ,Macroglial Cells ,Physiology ,Potassium ,Biochemistry ,Physical Chemistry ,Nervous System ,0302 clinical medicine ,Animal Cells ,Medicine and Health Sciences ,lcsh:QH301-705.5 ,Membrane potential ,Neurons ,Ecology ,Neurochemistry ,Neurotransmitters ,Electrophysiology ,Chemistry ,Computational Theory and Mathematics ,Modeling and Simulation ,Physical Sciences ,Cellular Types ,Glutamate ,Anatomy ,Research Article ,Chemical Elements ,Sodium ,Membrane lipids ,Models, Neurological ,chemistry.chemical_element ,Glutamic Acid ,Neurophysiology ,Glial Cells ,Models, Biological ,Membrane Potential ,Ion ,03 medical and health sciences ,Cellular and Molecular Neuroscience ,Cations ,Genetics ,Extracellular ,Animals ,Computer Simulation ,Molecular Biology ,Ecology, Evolution, Behavior and Systematics ,Ions ,Lipid microdomain ,Computational Biology ,Biology and Life Sciences ,Cell Biology ,Dipole ,030104 developmental biology ,chemistry ,lcsh:Biology (General) ,Astrocytes ,Cellular Neuroscience ,Synapses ,Biophysics ,Extracellular Space ,030217 neurology & neurosurgery ,Neuroscience - Abstract
A biophysical model that captures molecular homeostatic control of ions at the perisynaptic cradle (PsC) is of fundamental importance for understanding the interplay between astroglial and neuronal compartments. In this paper, we develop a multi-compartmental mathematical model which proposes a novel mechanism whereby the flow of cations in thin processes is restricted due to negatively charged membrane lipids which result in the formation of deep potential wells near the dipole heads. These wells restrict the flow of cations to “hopping” between adjacent wells as they transverse the process, and this surface retention of cations will be shown to give rise to the formation of potassium (K+) and sodium (Na+) microdomains at the PsC. We further propose that a K+ microdomain formed at the PsC, provides the driving force for the return of K+ to the extracellular space for uptake by the neurone, thereby preventing K+ undershoot. A slow decay of Na+ was also observed in our simulation after a period of glutamate stimulation which is in strong agreement with experimental observations. The pathological implications of microdomain formation during neuronal excitation are also discussed., Author summary During periods of neuronal activity, ionic homeostasis in the surrounding extracellular space (ECS) is disturbed. To provide a healthy environment for continued neuronal function, excess ions such as potassium must be buffered away from the ECS; a vital supportive role provided by astrocyte cells. It has long been thought that astrocytes not only removed ions from the ECS but also transport them to other areas of the brain where their concentrations are lower. However, while our computational model simulations agree that astrocytes do remove these ions from the ECS they also show that these ions are mainly stored locally at the PsC to be returned to the ECS, thus restoring ionic homeostasis. Furthermore, we detail in this paper that this happens because of a previously overlooked biophysical phenomenon that is only dominant in thin astrocyte processes. The flow of these cations within thin processes is primarily by surface conduction where they experience the attraction of fixed negative charge at the membrane inner surface. This negative charge constrains cation movement along the surface and so their flow rate is restricted. Consequently, ions such as potassium that are released during neuronal excitation enter the PsC and are stored locally due to the low conductance pathway between the PsC and the astrocyte soma. Our simulations also show that this local build-up of K+ is returned to the ECS after the neuronal activity dies off which could potentially explain why K+ undershoot has not been observed; this result agrees with experimental observations. Moreover, the same mechanism can also explain the transient behaviour of Na+ ions whereby in thin processes a slow decay time constant is experimentally observed. These findings have important implications for the role of astrocytes in regulating neuronal excitability under physiological and pathological conditions, and therefore highlight the significance of the work presented in this paper.
- Published
- 2018
- Full Text
- View/download PDF
7. Bayesian inference of phylogenetic networks from bi-allelic genetic markers
- Author
-
Heidi M. Meudt, Yun Yu, Jiafan Zhu, Luay Nakhleh, and Dingqiao Wen
- Subjects
0301 basic medicine ,0106 biological sciences ,Computer science ,Gene Identification and Analysis ,Genetic Networks ,01 natural sciences ,Coalescent theory ,Bayes' theorem ,Computational phylogenetics ,Statistical inference ,lcsh:QH301-705.5 ,Genome Evolution ,Phylogeny ,Data Management ,Genetics ,Recombination, Genetic ,0303 health sciences ,Likelihood Functions ,Ecology ,Phylogenetic tree ,Applied Mathematics ,Simulation and Modeling ,Nucleic Acid Hybridization ,Phylogenetic Analysis ,Plantaginaceae ,Phylogenetic network ,Genomics ,Phylogenetics ,Computational Theory and Mathematics ,Modeling and Simulation ,Physical Sciences ,symbols ,Network Analysis ,Algorithms ,Research Article ,Genetic Markers ,Computer and Information Sciences ,Computational biology ,Biology ,Bayesian inference ,Research and Analysis Methods ,Genes, Plant ,010603 evolutionary biology ,Polymorphism, Single Nucleotide ,Molecular Evolution ,Cellular and Molecular Neuroscience ,03 medical and health sciences ,symbols.namesake ,Evolutionary Systematics ,Computer Simulation ,Molecular Biology ,Ecology, Evolution, Behavior and Systematics ,Alleles ,030304 developmental biology ,Taxonomy ,Probability ,Evolutionary Biology ,Models, Genetic ,Biology and Life Sciences ,Computational Biology ,Markov chain Monte Carlo ,Bayes Theorem ,Tree (graph theory) ,030104 developmental biology ,lcsh:Biology (General) ,Genetic Loci ,Mathematics ,Software ,New Zealand - Abstract
Phylogenetic networks are rooted, directed, acyclic graphs that model reticulate evolutionary histories. Recently, statistical methods were devised for inferring such networks from either gene tree estimates or the sequence alignments of multiple unlinked loci. Bi-allelic markers, most notably single nucleotide polymorphisms (SNPs) and amplified fragment length polymorphisms (AFLPs), provide a powerful source of genome-wide data. In a recent paper, a method called SNAPP was introduced for statistical inference of species trees from unlinked bi-allelic markers. The generative process assumed by the method combined both a model of evolution for the bi-allelic markers, as well as the multispecies coalescent. A novel component of the method was a polynomial-time algorithm for exact computation of the likelihood of a fixed species tree via integration over all possible gene trees for a given marker. Here we report on a method for Bayesian inference of phylogenetic networks from bi-allelic markers. Our method significantly extends the algorithm for exact computation of phylogenetic network likelihood via integration over all possible gene trees. Unlike the case of species trees, the algorithm is no longer polynomial-time on all instances of phylogenetic networks. Furthermore, the method utilizes a reversible-jump MCMC technique to sample the posterior of phylogenetic networks given bi-allelic marker data. Our method has a very good performance in terms of accuracy and robustness as we demonstrate on simulated data, as well as a data set of multiple New Zealand species of the plant genus Ourisia (Plantaginaceae). We implemented the method in the publicly available, open-source PhyloNet software package., Author summary The availability of genomic data has revolutionized the study of evolutionary histories and phylogeny inference. Inferring evolutionary histories from genomic data requires, in most cases, accounting for the fact that different genomic regions could have evolutionary histories that differ from each other as well as from that of the species from which the genomes were sampled. In this paper, we introduce a method for inferring evolutionary histories while accounting for two processes that could give rise to such differences across the genomes, namely incomplete lineage sorting and hybridization. We introduce a novel algorithm for computing the likelihood of phylogenetic networks from bi-allelic genetic markers and use it in a Bayesian inference method. Analyses of synthetic and empirical data sets show a very good performance of the method in terms of the estimates it obtains.
- Published
- 2018
8. A metabolic core model elucidates how enhanced utilization of glucose and glutamine, with enhanced glutamine-dependent lactate production, promotes cancer cell growth
- Author
-
Damiani, Chiara, Colombo, Riccardo, Gaglio, Daniela, Mastroianni, Fabrizia, Pescini, Dario, Westerhoff, Hans Victor, Mauri, Giancarlo, Vanoni, Marco, Alberghina, Lilia, Damiani, C, Colombo, R, Gaglio, D, Mastroianni, F, Pescini, D, Westerhoff, H, Mauri, G, Vanoni, M, Alberghina, L, Synthetic Systems Biology (SILS, FNWI), Molecular Cell Physiology, and AIMMS
- Subjects
Metabolic Processes ,0301 basic medicine ,Glucose uptake ,Glutamine ,Biochemistry ,7. Clean energy ,Glucose Metabolism ,Drug Metabolism ,Metabolic Flux Analysi ,Neoplasms ,Metabolic flux analysis ,Medicine and Health Sciences ,Amino Acids ,lcsh:QH301-705.5 ,Ecology ,Organic Compounds ,Acidic Amino Acids ,Monosaccharides ,Ketones ,Enzymes ,Flux balance analysis ,Chemistry ,Computational Theory and Mathematics ,Modeling and Simulation ,Physical Sciences ,Carbohydrate Metabolism ,Oxidoreductases ,Metabolic Networks and Pathways ,Research Article ,Chemical Elements ,Human ,Pyruvate ,Citric Acid Cycle ,Carbohydrates ,Biology ,Models, Biological ,03 medical and health sciences ,Cellular and Molecular Neuroscience ,Genetic ,Manchester Institute of Biotechnology ,Genetics ,Animals ,Humans ,Pharmacokinetics ,Computer Simulation ,Lactic Acid ,Molecular Biology ,Dehydrogenases ,Ecology, Evolution, Behavior and Systematics ,Cell Proliferation ,Pharmacology ,Organic Chemistry ,Chemical Compounds ,Biology and Life Sciences ,Proteins ,Metabolic Networks and Pathway ,Metabolism ,ResearchInstitutes_Networks_Beacons/manchester_institute_of_biotechnology ,Metabolic Flux Analysis ,Oxygen ,Citric acid cycle ,Metabolic pathway ,030104 developmental biology ,Glucose ,lcsh:Biology (General) ,Enzymology ,Acids ,Flux (metabolism) - Abstract
Cancer cells share several metabolic traits, including aerobic production of lactate from glucose (Warburg effect), extensive glutamine utilization and impaired mitochondrial electron flow. It is still unclear how these metabolic rearrangements, which may involve different molecular events in different cells, contribute to a selective advantage for cancer cell proliferation. To ascertain which metabolic pathways are used to convert glucose and glutamine to balanced energy and biomass production, we performed systematic constraint-based simulations of a model of human central metabolism. Sampling of the feasible flux space allowed us to obtain a large number of randomly mutated cells simulated at different glutamine and glucose uptake rates. We observed that, in the limited subset of proliferating cells, most displayed fermentation of glucose to lactate in the presence of oxygen. At high utilization rates of glutamine, oxidative utilization of glucose was decreased, while the production of lactate from glutamine was enhanced. This emergent phenotype was observed only when the available carbon exceeded the amount that could be fully oxidized by the available oxygen. Under the latter conditions, standard Flux Balance Analysis indicated that: this metabolic pattern is optimal to maximize biomass and ATP production; it requires the activity of a branched TCA cycle, in which glutamine-dependent reductive carboxylation cooperates to the production of lipids and proteins; it is sustained by a variety of redox-controlled metabolic reactions. In a K-ras transformed cell line we experimentally assessed glutamine-induced metabolic changes. We validated computational results through an extension of Flux Balance Analysis that allows prediction of metabolite variations. Taken together these findings offer new understanding of the logic of the metabolic reprogramming that underlies cancer cell growth., Author summary Hallmarks describing common key events in initiation, maintenance and progression of cancer have been identified. One hallmark deals with rewiring of metabolic reactions required to sustain enhanced cell proliferation. The availability of molecular, mechanistic models of cancer hallmarks will mightily improve optimized personal treatment and new drug discovery. Metabolism is the only hallmark for which it is currently possible to derive large scale mathematical models, which have predictive ability. In this paper, we exploit a constraint-based model of the core metabolism required for biomass conversion of the most relevant nutrients—glucose and glutamine—to clarify the logic of control of cancer metabolism. We newly report that, when available oxygen is not sufficient to fully oxidize available glucose and glutamine carbons–a situation compatible with that observed under normal oxygen conditions in human and in cancer cells growing in vitro—utilization of glutamine by reductive carboxylation and conversion of glucose and glutamine to lactate confer advantage for biomass production. Redox homeostasis can be maintained through the use of different alternative pathways. In conclusion, this paper offers a logic interpretation to the link between metabolic rewiring and enhanced proliferation, which may offer new approaches to targeted drug discovery and utilization.
- Published
- 2017
- Full Text
- View/download PDF
9. Rearrangement moves on rooted phylogenetic networks
- Author
-
Gambette, Philippe, van Iersel, Leo, Jones, Mark, Lafond, Manuel, Pardi, Fabio, Scornavacca, Celine, Laboratoire d'Informatique Gaspard-Monge (LIGM), Centre National de la Recherche Scientifique (CNRS)-Fédération de Recherche Bézout-ESIEE Paris-École des Ponts ParisTech (ENPC)-Université Paris-Est Marne-la-Vallée (UPEM), Delft Institute of Applied Mathematics (TWA), Faculty of Electrical Engineering, Mathematics and Computer Science [Delft] (EEMCS)-Delft University of Technology (TU Delft), Department of Mathematics and Statistics [Ottawa], University of Ottawa [Ottawa], Institut de Biologie Computationnelle (IBC), Université de Montpellier (UM)-Institut National de la Recherche Agronomique (INRA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS), Méthodes et Algorithmes pour la Bioinformatique (MAB), Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM), Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM), Institut des Sciences de l'Evolution de Montpellier (UMR ISEM), École pratique des hautes études (EPHE), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Université de Montpellier (UM)-Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-Centre National de la Recherche Scientifique (CNRS)-Institut de recherche pour le développement [IRD] : UR226, CNRS PICS 230310 (CoCoAlSeq), NWO Vidi 639.072.602, NSERC PDF Grant, ANR-10-BINF-0001,ANCESTROME,Approche de phylogénie intégrative pour la reconstruction de génomes ancestraux(2010), European Project: 634650,H2020,H2020-PHC-2014-two-stage,VIROGENESIS(2015), Université Paris-Est Marne-la-Vallée (UPEM)-École des Ponts ParisTech (ENPC)-ESIEE Paris-Fédération de Recherche Bézout-Centre National de la Recherche Scientifique (CNRS), Institut National de la Recherche Agronomique (INRA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS), Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS), Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-École pratique des hautes études (EPHE), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Université de Montpellier (UM)-Institut de recherche pour le développement [IRD] : UR226-Centre National de la Recherche Scientifique (CNRS), Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-École Pratique des Hautes Études (EPHE), Laboratoire d'Informatique Gaspard-Monge (ligm), University of Ottawa [Ottawa] (uOttawa), Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-École pratique des hautes études (EPHE)-Université de Montpellier (UM)-Institut de recherche pour le développement [IRD] : UR226-Centre National de la Recherche Scientifique (CNRS), and ANR-10-BINF-01-01/10-BINF-0001,ANCESTROME,ANCESTROME(2010)
- Subjects
Optimization ,Evolutionary Genetics ,Computer and Information Sciences ,Evolutionary Processes ,nearest-neighbor interchange ,Gene Transfer ,[INFO.INFO-DM]Computer Science [cs]/Discrete Mathematics [cs.DM] ,phylogeny ,Microbiology ,phylogenetic networks ,Genetics ,Animals ,Humans ,Evolutionary Systematics ,lcsh:QH301-705.5 ,Phylogeny ,Taxonomy ,Data Management ,Horizontal Gene Transfer ,Gene Rearrangement ,Evolutionary Biology ,Models, Genetic ,subtree pruning and regrafting ,Biology and Life Sciences ,Computational Biology ,Phylogenetic Analysis ,Hominidae ,[SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM] ,Organismal Evolution ,Phylogenetics ,lcsh:Biology (General) ,Physical Sciences ,Microbial Evolution ,rearrangement moves ,[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM] ,Mathematics ,Network Analysis ,Research Article - Abstract
Phylogenetic tree reconstruction is usually done by local search heuristics that explore the space of the possible tree topologies via simple rearrangements of their structure. Tree rearrangement heuristics have been used in combination with practically all optimization criteria in use, from maximum likelihood and parsimony to distance-based principles, and in a Bayesian context. Their basic components are rearrangement moves that specify all possible ways of generating alternative phylogenies from a given one, and whose fundamental property is to be able to transform, by repeated application, any phylogeny into any other phylogeny. Despite their long tradition in tree-based phylogenetics, very little research has gone into studying similar rearrangement operations for phylogenetic network—that is, phylogenies explicitly representing scenarios that include reticulate events such as hybridization, horizontal gene transfer, population admixture, and recombination. To fill this gap, we propose “horizontal” moves that ensure that every network of a certain complexity can be reached from any other network of the same complexity, and “vertical” moves that ensure reachability between networks of different complexities. When applied to phylogenetic trees, our horizontal moves—named rNNI and rSPR—reduce to the best-known moves on rooted phylogenetic trees, nearest-neighbor interchange and rooted subtree pruning and regrafting. Besides a number of reachability results—separating the contributions of horizontal and vertical moves—we prove that rNNI moves are local versions of rSPR moves, and provide bounds on the sizes of the rNNI neighborhoods. The paper focuses on the most biologically meaningful versions of phylogenetic networks, where edges are oriented and reticulation events clearly identified. Moreover, our rearrangement moves are robust to the fact that networks with higher complexity usually allow a better fit with the data. Our goal is to provide a solid basis for practical phylogenetic network reconstruction., Author summary Phylogenetic networks are used to represent reticulate evolution, that is, cases in which the tree-of-life metaphor for evolution breaks down, because some of its branches have merged at one or several points in the past. This may occur, for example, when some organisms in the phylogeny are hybrids. In this paper, we deal with an elementary question for the reconstruction of phylogenetic networks: how to explore the space of all possible networks. The fundamental component for this is the set of operations that should be employed to generate alternative hypotheses for what happened in the past—which serve as basic blocks for optimization techniques such as hill-climbing. Although these approaches have a long tradition in classic tree-based phylogenetics, their application to networks that explicitly represent reticulate evolution is relatively unexplored. This paper provides the fundamental definitions and theoretical results for subsequent work in practical methods for phylogenetic network reconstruction: we subdivide networks into layers, according to a generally-accepted measure of their complexity, and provide operations that allow both to fully explore each layer, and to move across different layers. These operations constitute natural generalizations of well-known operations for the exploration of the space of phylogenetic trees, the lowest layer in the hierarchy described above.
- Published
- 2017
- Full Text
- View/download PDF
10. Personalized glucose forecasting for type 2 diabetes using data assimilation
- Author
-
David J. Albers, George Hripcsak, Matthew E. Levine, Lena Mamykina, Henry N. Ginsberg, and Bruce J. Gluckman
- Subjects
Blood Glucose ,Male ,Patient-Specific Modeling ,Computer science ,Physiology ,computer.software_genre ,Biochemistry ,0302 clinical medicine ,Data assimilation ,Endocrinology ,Mathematical and Statistical Techniques ,Medicine and Health Sciences ,Diabetes diagnosis and management ,Insulin ,030212 general & internal medicine ,lcsh:QH301-705.5 ,Computational model ,Ecology ,Organic Compounds ,Monosaccharides ,Non-insulin-dependent diabetes--Nutritional aspects ,Regression ,Blood Sugar ,3. Good health ,Type 2 Diabetes ,Body Fluids ,Chemistry ,Blood ,Computational Theory and Mathematics ,Modeling and Simulation ,Physical Sciences ,Female ,Anatomy ,Algorithms ,Statistics (Mathematics) ,Research Article ,Adult ,HbA1c ,Endocrine Disorders ,Carbohydrates ,030209 endocrinology & metabolism ,Bayesian inference ,Machine learning ,Research and Analysis Methods ,03 medical and health sciences ,Cellular and Molecular Neuroscience ,Genetics ,Diabetes Mellitus ,Humans ,Hemoglobin ,Statistical Methods ,Molecular Biology ,Ecology, Evolution, Behavior and Systematics ,Simulation ,Glycemic ,Nutrition ,Biology and life sciences ,business.industry ,Model selection ,Organic Chemistry ,Chemical Compounds ,Correction ,Computational Biology ,Proteins ,Kalman filter ,Diagnostic medicine ,Glucose ,Diabetes Mellitus, Type 2 ,lcsh:Biology (General) ,Metabolic Disorders ,Artificial intelligence ,business ,computer ,Mathematics ,Forecasting - Abstract
Type 2 diabetes leads to premature death and reduced quality of life for 8% of Americans. Nutrition management is critical to maintaining glycemic control, yet it is difficult to achieve due to the high individual differences in glycemic response to nutrition. Anticipating glycemic impact of different meals can be challenging not only for individuals with diabetes, but also for expert diabetes educators. Personalized computational models that can accurately forecast an impact of a given meal on an individual’s blood glucose levels can serve as the engine for a new generation of decision support tools for individuals with diabetes. However, to be useful in practice, these computational engines need to generate accurate forecasts based on limited datasets consistent with typical self-monitoring practices of individuals with type 2 diabetes. This paper uses three forecasting machines: (i) data assimilation, a technique borrowed from atmospheric physics and engineering that uses Bayesian modeling to infuse data with human knowledge represented in a mechanistic model, to generate real-time, personalized, adaptable glucose forecasts; (ii) model averaging of data assimilation output; and (iii) dynamical Gaussian process model regression. The proposed data assimilation machine, the primary focus of the paper, uses a modified dual unscented Kalman filter to estimate states and parameters, personalizing the mechanistic models. Model selection is used to make a personalized model selection for the individual and their measurement characteristics. The data assimilation forecasts are empirically evaluated against actual postprandial glucose measurements captured by individuals with type 2 diabetes, and against predictions generated by experienced diabetes educators after reviewing a set of historical nutritional records and glucose measurements for the same individual. The evaluation suggests that the data assimilation forecasts compare well with specific glucose measurements and match or exceed in accuracy expert forecasts. We conclude by examining ways to present predictions as forecast-derived range quantities and evaluate the comparative advantages of these ranges., Author summary Type 2 diabetes is a devastating disease that requires constant patient self-management of glucose, insulin, nutrition and exercise. Nevertheless, glucose and insulin dynamics are complicated, nonstationary, nonlinear, and individual-dependent, making self-management of diabetes a complex task. To help alleviate some of the difficulty for patients, we develop a method for personalized, real-time, glucose forecasting based on nutrition. Specifically, we create and evaluate the computational machinery based on both Gaussian process models and data assimilation that leverages the physiologic knowledge of two mechanistic models to produce a personalized, nutrition-based glucose forecast for individuals with type 2 diabetes in real time that is robust to sparse data and nonstationary patients. Our computational engine was conceived to be of potential use for diabetes self-management.
- Published
- 2017
11. A Graph-Centric Approach for Metagenome-Guided Peptide and Protein Identification in Metaproteomics
- Author
-
Haixu Tang, Sujun Li, and Yuzhen Ye
- Subjects
0301 basic medicine ,Proteomics ,Peptide ,Plant Science ,Biochemistry ,De Bruijn graph ,Database and Informatics Methods ,Tandem Mass Spectrometry ,Database Searching ,Photosynthesis ,lcsh:QH301-705.5 ,chemistry.chemical_classification ,Ecology ,Plant Biochemistry ,Microbiota ,Genomics ,6. Clean water ,Computational Theory and Mathematics ,Modeling and Simulation ,symbols ,Sequence Analysis ,Algorithms ,Research Article ,Gene prediction ,Sequence Databases ,Computational biology ,Biology ,Research and Analysis Methods ,03 medical and health sciences ,Cellular and Molecular Neuroscience ,symbols.namesake ,Genetics ,Ribulose-1,5-Bisphosphate Carboxylase Oxygenase ,Humans ,Molecular Biology Techniques ,Sequencing Techniques ,Sequence Similarity Searching ,Gene Prediction ,Gene ,Molecular Biology ,Ecology, Evolution, Behavior and Systematics ,Sequence Assembly Tools ,Biology and Life Sciences ,Computational Biology ,Proteins ,Genome Analysis ,030104 developmental biology ,Biological Databases ,lcsh:Biology (General) ,chemistry ,Metagenomics ,Metaproteomics ,Protein identification ,Peptides - Abstract
Metaproteomic studies adopt the common bottom-up proteomics approach to investigate the protein composition and the dynamics of protein expression in microbial communities. When matched metagenomic and/or metatranscriptomic data of the microbial communities are available, metaproteomic data analyses often employ a metagenome-guided approach, in which complete or fragmental protein-coding genes are first directly predicted from metagenomic (and/or metatranscriptomic) sequences or from their assemblies, and the resulting protein sequences are then used as the reference database for peptide/protein identification from MS/MS spectra. This approach is often limited because protein coding genes predicted from metagenomes are incomplete and fragmental. In this paper, we present a graph-centric approach to improving metagenome-guided peptide and protein identification in metaproteomics. Our method exploits the de Bruijn graph structure reported by metagenome assembly algorithms to generate a comprehensive database of protein sequences encoded in the community. We tested our method using several public metaproteomic datasets with matched metagenomic and metatranscriptomic sequencing data acquired from complex microbial communities in a biological wastewater treatment plant. The results showed that many more peptides and proteins can be identified when assembly graphs were utilized, improving the characterization of the proteins expressed in the microbial communities. The additional proteins we identified contribute to the characterization of important pathways such as those involved in degradation of chemical hazards. Our tools are released as open-source software on github at https://github.com/COL-IU/Graph2Pro., Author Summary In recent years, meta-omic (including metatranscriptomic and metaproteomic) techniques have been adopted as complementary approaches to metagenomic sequencing to study functional characteristics and dynamics of microbial communities, aiming at a holistic understanding of a community to respond to the changes in the environment. Currently, metaproteomic data are largely analyzed using the bioinformatics tools originally designed in bottom-up proteomics. In particular, recent metaproteomic studies employed a metagenome-guided approach, in which complete or fragmental protein-coding genes were first predicted from metagenomic sequences (i.e., contigs or scaffolds), acquired from the matched community samples, and predicted protein sequences were then used in peptide identification. A key challenge of this approach is that the protein coding genes predicted from assembled metagenomic contigs can be incomplete and fragmented due to the complexity of metagenomic samples and the short reads length in metagenomic sequencing. To address this issue, in this paper, we present a graph-centric approach that exploits the de bruijn graph structure reported by metagenome assembly algorithms to improve metagenome-guided peptide and protein identification in metaproteomics. We show that our method can identify much more peptides and proteins, improving the characterization of the proteins expressed in the microbial communities.
- Published
- 2016
12. Annealed Importance Sampling for Neural Mass Models
- Author
-
Will Penny and Biswa Sengupta
- Subjects
Optimization ,Imaging Techniques ,Models, Neurological ,Action Potentials ,Neuroimaging ,Linear Regression Analysis ,Research and Analysis Methods ,Mathematical and Statistical Techniques ,Differential Equations ,Connectome ,Computer Simulation ,Statistical Methods ,Signal to Noise Ratio ,lcsh:QH301-705.5 ,Neurons ,Models, Statistical ,Approximation Methods ,Applied Mathematics ,Simulation and Modeling ,Biology and Life Sciences ,Brain ,Monte Carlo method ,lcsh:Biology (General) ,Sample Size ,Physical Sciences ,Signal Processing ,Regression Analysis ,Engineering and Technology ,Nerve Net ,Mathematics ,Statistics (Mathematics) ,Algorithms ,Research Article ,Neuroscience - Abstract
Neural Mass Models provide a compact description of the dynamical activity of cell populations in neocortical regions. Moreover, models of regional activity can be connected together into networks, and inferences made about the strength of connections, using M/EEG data and Bayesian inference. To date, however, Bayesian methods have been largely restricted to the Variational Laplace (VL) algorithm which assumes that the posterior distribution is Gaussian and finds model parameters that are only locally optimal. This paper explores the use of Annealed Importance Sampling (AIS) to address these restrictions. We implement AIS using proposals derived from Langevin Monte Carlo (LMC) which uses local gradient and curvature information for efficient exploration of parameter space. In terms of the estimation of Bayes factors, VL and AIS agree about which model is best but report different degrees of belief. Additionally, AIS finds better model parameters and we find evidence of non-Gaussianity in their posterior distribution., Author Summary The activity of populations of neurons in the human brain can be described using a set of differential equations known as a neural mass model. These models can then be connected to describe activity in multiple brain regions and, by fitting them to human brain imaging data, statistical inferences can be made about changes in macroscopic connectivity among brain regions. For example, the strength of a connection from one region to another may be more strongly engaged in a particular patient population or during a specific cognitive task. Current statistical inference approaches use a Bayesian algorithm based on principles of local optimization and the assumption that uncertainty about model parameters (e.g. connectivity), having seen the data, follows a Gaussian distribution. This paper evaluates current methods against a global Bayesian optimization algorithm and finds that the two approaches (local/global) agree about which model is best, but finds that the global approach produces better parameter estimates.
- Published
- 2016
13. Ten Simple Rules for Creating a Good Data Management Plan
- Author
-
William K. Michener
- Subjects
0106 biological sciences ,Databases, Factual ,Computer science ,Data management ,Information Storage and Retrieval ,Guidelines as Topic ,010603 evolutionary biology ,01 natural sciences ,Data modeling ,Data governance ,03 medical and health sciences ,Cellular and Molecular Neuroscience ,Genetics ,Molecular Biology ,lcsh:QH301-705.5 ,Ecology, Evolution, Behavior and Systematics ,030304 developmental biology ,Data administration ,0303 health sciences ,Ecology ,business.industry ,Data discovery ,Data management plan ,Data science ,Data warehouse ,Data Accuracy ,Computational Theory and Mathematics ,lcsh:Biology (General) ,Research Design ,Modeling and Simulation ,Data quality ,Perspective ,Database Management Systems ,business ,Algorithms - Abstract
Research papers and data products are key outcomes of the science enterprise. Governmental, nongovernmental, and private foundation sponsors of research are increasingly recognizing the value of research data. As a result, most funders now require that sufficiently detailed data management plans be submitted as part of a research proposal. A data management plan (DMP) is a document that describes how you will treat your data during a project and what happens with the data after the project ends. Such plans typically cover all or portions of the data life cycle—from data discovery, collection, and organization (e.g., spreadsheets, databases), through quality assurance/quality control, documentation (e.g., data types, laboratory methods) and use of the data, to data preservation and sharing with others (e.g., data policies and dissemination approaches). Fig 1 illustrates the relationship between hypothetical research and data life cycles and highlights the links to the rules presented in this paper. The DMP undergoes peer review and is used in part to evaluate a project’s merit. Plans also document the data management activities associated with funded projects and may be revisited during performance reviews. Open in a separate window Fig 1 Relationship of the research life cycle (A) to the data life cycle (B); note: highlighted circles refer to the rules that are most closely linked to the steps of the data life cycle. As part of the research life cycle (A), many researchers (1) test ideas and hypotheses by (2) acquiring data that are (3) incorporated into various analyses and visualizations, leading to interpretations that are then (4) published in the literature and disseminated via other mechanisms (e.g., conference presentations, blogs, tweets), and that often lead back to (1) new ideas and hypotheses. During the data life cycle (B), researchers typically (1) develop a plan for how data will be managed during and after the project; (2) discover and acquire existing data and (3) collect and organize new data; (4) assure the quality of the data; (5) describe the data (i.e., ascribe metadata); (6) use the data in analyses, models, visualizations, etc.; and (7) preserve and (8) share the data with others (e.g., researchers, students, decision makers), possibly leading to new ideas and hypotheses.
- Published
- 2015
14. Discrete Element Framework for Modelling Extracellular Matrix, Deformable Cells and Subcellular Components
- Author
-
David Smith, Grand Roman Joldes, Addison J. Rich, Antony W. Burgess, Chin Wee Tan, Bruce S. Gardiner, and Kelvin K. L. Wong
- Subjects
Integrin ,Mechanotransduction, Cellular ,Models, Biological ,Cell Physiological Phenomena ,Extracellular matrix ,Cellular and Molecular Neuroscience ,Cellular Microenvironment ,Genetics ,Animals ,Humans ,Computer Simulation ,Mechanotransduction ,Cytoskeleton ,Molecular Biology ,lcsh:QH301-705.5 ,Ecology, Evolution, Behavior and Systematics ,Models, Statistical ,Ecology ,biology ,Extracellular Matrix ,Membrane ,Computational Theory and Mathematics ,lcsh:Biology (General) ,Modeling and Simulation ,biology.protein ,Discrete particle ,Biological system ,Discrete modelling ,Research Article ,Subcellular Fractions - Abstract
This paper presents a framework for modelling biological tissues based on discrete particles. Cell components (e.g. cell membranes, cell cytoskeleton, cell nucleus) and extracellular matrix (e.g. collagen) are represented using collections of particles. Simple particle to particle interaction laws are used to simulate and control complex physical interaction types (e.g. cell-cell adhesion via cadherins, integrin basement membrane attachment, cytoskeletal mechanical properties). Particles may be given the capacity to change their properties and behaviours in response to changes in the cellular microenvironment (e.g., in response to cell-cell signalling or mechanical loadings). Each particle is in effect an ‘agent’, meaning that the agent can sense local environmental information and respond according to pre-determined or stochastic events. The behaviour of the proposed framework is exemplified through several biological problems of ongoing interest. These examples illustrate how the modelling framework allows enormous flexibility for representing the mechanical behaviour of different tissues, and we argue this is a more intuitive approach than perhaps offered by traditional continuum methods. Because of this flexibility, we believe the discrete modelling framework provides an avenue for biologists and bioengineers to explore the behaviour of tissue systems in a computational laboratory., Author Summary Modelling is an important tool in understanding the behaviour of biological tissues. In this paper we advocate a new modelling framework in which cells and tissues are represented by a collection of particles with associated properties. The particles interact with each other and can change their behaviour in response to changes in their environment. We demonstrate how the propose framework can be used to represent the mechanical behaviour of different tissues with much greater flexibility as compared to traditional continuum based methods.
- Published
- 2015
15. Reinforcement Learning of Linking and Tracing Contours in Recurrent Neural Networks
- Author
-
Heiko Neumann, Pieter R. Roelfsema, Tobias Brosch, Amsterdam Neuroscience, Adult Psychiatry, Integrative Neurophysiology, Neuroscience Campus Amsterdam - Brain Mechanisms in Health & Disease, and Netherlands Institute for Neuroscience (NIN)
- Subjects
Visual perception ,Computer science ,Models, Neurological ,03 medical and health sciences ,Cellular and Molecular Neuroscience ,0302 clinical medicine ,Memory ,Learning rule ,Genetics ,medicine ,Animals ,Reinforcement learning ,Computer Simulation ,Molecular Biology ,lcsh:QH301-705.5 ,Ecology, Evolution, Behavior and Systematics ,Object-based attention ,Visual Cortex ,030304 developmental biology ,Feedback, Physiological ,0303 health sciences ,Ecology ,Artificial neural network ,business.industry ,Visual cortex ,medicine.anatomical_structure ,Recurrent neural network ,Computational Theory and Mathematics ,lcsh:Biology (General) ,Learning curve ,Modeling and Simulation ,Visual Perception ,Macaca ,Artificial intelligence ,Nerve Net ,business ,Reinforcement, Psychology ,030217 neurology & neurosurgery ,Research Article - Abstract
The processing of a visual stimulus can be subdivided into a number of stages. Upon stimulus presentation there is an early phase of feedforward processing where the visual information is propagated from lower to higher visual areas for the extraction of basic and complex stimulus features. This is followed by a later phase where horizontal connections within areas and feedback connections from higher areas back to lower areas come into play. In this later phase, image elements that are behaviorally relevant are grouped by Gestalt grouping rules and are labeled in the cortex with enhanced neuronal activity (object-based attention in psychology). Recent neurophysiological studies revealed that reward-based learning influences these recurrent grouping processes, but it is not well understood how rewards train recurrent circuits for perceptual organization. This paper examines the mechanisms for reward-based learning of new grouping rules. We derive a learning rule that can explain how rewards influence the information flow through feedforward, horizontal and feedback connections. We illustrate the efficiency with two tasks that have been used to study the neuronal correlates of perceptual organization in early visual cortex. The first task is called contour-integration and demands the integration of collinear contour elements into an elongated curve. We show how reward-based learning causes an enhancement of the representation of the to-be-grouped elements at early levels of a recurrent neural network, just as is observed in the visual cortex of monkeys. The second task is curve-tracing where the aim is to determine the endpoint of an elongated curve composed of connected image elements. If trained with the new learning rule, neural networks learn to propagate enhanced activity over the curve, in accordance with neurophysiological data. We close the paper with a number of model predictions that can be tested in future neurophysiological and computational studies., Author Summary Our experience with the visual world allows us to group image elements that belong to the same perceptual object and to segregate them from other objects and the background. If subjects learn to group contour elements, this experience influences neuronal activity in early visual cortical areas, including the primary visual cortex (V1). Learning presumably depends on alterations in the pattern of connections within and between areas of the visual cortex. However, the processes that control changes in connectivity are not well understood. Here we present the first computational model that can train a neural network to integrate collinear contour elements into elongated curves and to trace a curve through the visual field. The new learning algorithm trains fully recurrent neural networks, provided the connectivity causes the networks to reach a stable state. The model reproduces the behavioral performance of monkeys trained in these tasks and explains the patterns of neuronal activity in the visual cortex that emerge during learning, which is remarkable because the only feedback for the model is a reward for successful trials. We discuss a number of the model predictions that can be tested in future neuroscientific work.
- Published
- 2015
- Full Text
- View/download PDF
16. The Problem with Phi: A Critique of Integrated Information Theory
- Author
-
Michael A. Cerullo
- Subjects
Philosophy of mind ,Biomedical Research ,Electromagnetic theories of consciousness ,Consciousness ,Computer science ,media_common.quotation_subject ,Functionalism (philosophy of mind) ,Models, Neurological ,Information Theory ,Qualia ,Cellular and Molecular Neuroscience ,Empirical research ,Argument ,Genetics ,Humans ,Molecular Biology ,lcsh:QH301-705.5 ,Ecology, Evolution, Behavior and Systematics ,media_common ,Cognitive science ,Ecology ,Integrated information theory ,Epistemology ,Computational Theory and Mathematics ,lcsh:Biology (General) ,Modeling and Simulation ,Perspective ,Algorithms ,Split-Brain Procedure - Abstract
Summary In the last decade, Guilio Tononi has developed the Integrated Information Theory (IIT) of consciousness. IIT postulates that consciousness is equal to integrated information (F). The goal of this paper is to show that IIT fails in its stated goal of quantifying consciousness. The paper will challenge the theoretical and empirical arguments in support of IIT. The main theoretical argument for the relevance of integrated information to consciousness is the principle of information exclusion. Yet, no justification is given to support this principle. Tononi claims there is significant empirical support for IIT, but this is called into question by the creation of a trivial theory of consciousness with equal explanatory power. After examining the theoretical and empirical evidence for IIT, arguments from philosophy of mind and epistemology will be examined. Since IIT is not a form of computational functionalism, it is vulnerable to fading/ dancing qualia arguments. Finally, the limitations of the phenomenological approach to studying consciousness are examined, and it will be shown that IIT is a theory of protoconsciousness rather than a theory of consciousness.
- Published
- 2015
17. Power Law Scaling in Human and Empty Room MEG Recordings
- Author
-
Manfred G. Kitzbichler, Edward T. Bullmore, Kitzbichler, Manfred [0000-0002-4494-0753], Bullmore, Edward [0000-0002-8955-8283], and Apollo - University of Cambridge Repository
- Subjects
Computer science ,Speech recognition ,Models, Neurological ,Action Potentials ,Power law ,Synaptic Transmission ,Formal Comment ,Cellular and Molecular Neuroscience ,Wavelet ,Biological Clocks ,Genetics ,medicine ,Animals ,Humans ,Computer Simulation ,Cortical Synchronization ,Molecular Biology ,Scaling ,lcsh:QH301-705.5 ,Ecology, Evolution, Behavior and Systematics ,Neurons ,Ecology ,medicine.diagnostic_test ,Brain ,Magnetoencephalography ,Phase synchronization ,Amplitude ,Computational Theory and Mathematics ,lcsh:Biology (General) ,Modeling and Simulation ,Metric (mathematics) ,Nerve Net ,Algorithm - Abstract
In 2009, we published a paper in PLOS Computational Biology [1] that described using a new, wavelet-based metric of phase synchronization in human MEG data. Specifically, we showed that this metric of phase synchronization, that we called the phase lock index (PLI), demonstrated power law scaling across all frequency intervals or wavelet scales from the low frequency delta band (1–2 Hz) to the high frequency gamma band (35–70 Hz). Based on these experimental results, and additional confirmatory data obtained from PLI measurements on time series generated by computational models of critical systems, we offered the interpretation that power law scaling of phase synchronization in human MEG recordings was compatible with the prior theory that human brain dynamics demonstrate self-organized criticality. More recently, in collaboration with a group at the National Institutes of Health (NIH), we published a paper in the Journal of Neuroscience [2] that described using a similar phase synchronization metric to explore scaling behaviour in MEG data recorded from normal human subjects and, crucially, in MEG data recorded with no human subject present, so-called “empty room” data. In Figure 9 of [2], we showed data indicating that phase synchronization appeared to demonstrate power law scaling even in empty room data; as reproduced here in Fig 1. Fig 1 First two panels of Figure 9 from Shriki et al. [2]. We originally judged this issue to be of minor concern, because, as shown in Figure 1 of [2], NIH empty scanner amplitude variance is about 1–2 orders of magnitude less than equivalent brain scans. This is most likely an underestimation given that data are Z-normalized and absolute amplitudes of empty scanner data should be lower than brain scans. Thus, we reasoned the contribution from “empty scanner” effects to PLI scaling in brain recordings should be insignificant. Nonetheless, the issue was addressed in the Discussion of [2], where we stated: “because of the ambiguity of PLI for brain scans and empty scanner, additional steps such as amplitude comparisons need to be taken into account” (page 7089 of [2]).
- Published
- 2015
18. The Effect of Incentives and Meta-incentives on the Evolution of Cooperation
- Author
-
Okada, Isamu, Yamamoto, Hitoshi, Toriumi, Fujio, and Sasaki, Tatsuya
- Subjects
Motivation ,Game Theory ,Reward ,lcsh:Biology (General) ,Cultural Evolution ,TheoryofComputation_GENERAL ,Cooperative Behavior ,Models, Theoretical ,lcsh:QH301-705.5 ,ComputingMilieux_MISCELLANEOUS ,Research Article - Abstract
Although positive incentives for cooperators and/or negative incentives for free-riders in social dilemmas play an important role in maintaining cooperation, there is still the outstanding issue of who should pay the cost of incentives. The second-order free-rider problem, in which players who do not provide the incentives dominate in a game, is a well-known academic challenge. In order to meet this challenge, we devise and analyze a meta-incentive game that integrates positive incentives (rewards) and negative incentives (punishments) with second-order incentives, which are incentives for other players’ incentives. The critical assumption of our model is that players who tend to provide incentives to other players for their cooperative or non-cooperative behavior also tend to provide incentives to their incentive behaviors. In this paper, we solve the replicator dynamics for a simple version of the game and analytically categorize the game types into four groups. We find that the second-order free-rider problem is completely resolved without any third-order or higher (meta) incentive under the assumption. To do so, a second-order costly incentive, which is given individually (peer-to-peer) after playing donation games, is needed. The paper concludes that (1) second-order incentives for first-order reward are necessary for cooperative regimes, (2) a system without first-order rewards cannot maintain a cooperative regime, (3) a system with first-order rewards and no incentives for rewards is the worst because it never reaches cooperation, and (4) a system with rewards for incentives is more likely to be a cooperative regime than a system with punishments for incentives when the cost-effect ratio of incentives is sufficiently large. This solution is general and strong in the sense that the game does not need any centralized institution or proactive system for incentives., Author Summary Although social dilemmas can be resolved if punishing non-cooperators or rewarding cooperators works, such rewards and punishments, i.e., external incentives, entail certain expenses. As a result, a cooperative player who shirks his or her duty to provide an incentive to other players will emerge, and he or she will be more advantageous than an incentive-provider. In fact, the problem of excluding such cooperative incentive-non-providers, or second-order free-riders, is a well-known academic challenge. In order to meet this challenge, we devise and analyze a meta-incentive game that integrates positive incentives (rewards) and negative incentives (punishments) with second-order incentives, which are incentives for other players’ incentives. In this paper, we solve the replicator dynamics for a simple version of the game and analytically categorize the game types into four groups. We show that second-order incentives for first-order reward are necessary for cooperative regimes. This solution is general and strong in the sense that the game does not need any centralized institution or proactive system for incentives.
- Published
- 2015
19. How to write a presubmission inquiry
- Author
-
Ruth Nussinov and Thomas Lengauer
- Subjects
Statement (computer science) ,Protocol (science) ,Ecology ,Scope (project management) ,Operations research ,Computer science ,Data science ,Cellular and Molecular Neuroscience ,Computational Theory and Mathematics ,lcsh:Biology (General) ,Argument ,Application domain ,Modeling and Simulation ,Problem domain ,Genetics ,Relevance (law) ,Use case ,Molecular Biology ,lcsh:QH301-705.5 ,Ecology, Evolution, Behavior and Systematics - Abstract
Like many other journals, the journal PLOS Computational Biology admits and in some cases requires presubmission inquiries to be submitted before the submission of a full paper. Presubmission inquiries serve the purpose of informing the journal’s Editorial Board of the essence of the intended submission. Based on the information in the inquiry, the editors can make a quick assessment of its contribution with respect to the criteria for publication in the journal. This assessment is then communicated to the authors. This enables fast turnaround to the authors about the basic suitability of a submission for processing by the journal and spares the editors and reviewers the effort of detailed inspection of submissions that clearly do not meet the criteria of the journal. In this Editorial, we give suggestions for preparing presubmission inquiries for journal submissions. We exemplify these suggestions with reference to presubmission inquiries for the Methods section of PLOS Computational Biology. However, our suggestions generalize to presubmission inquiries of other kinds and for other journals, and in places, we will make specific comments to that effect. Over two years ago, PLOS Computational Biology opened a special section dedicated to Methods papers. As the scope statement spells out, Methods papers should describe outstanding methods of exceptional importance that have been shown, or have the promise to provide new biological insights. The method must already be widely adopted, or have the promise of wide adoption by a broad community of users. Enhancements to existing published methods will only be considered if those enhancements bring exceptional new capabilities. Since Methods papers are different from other research papers in PLOS Computational Biology and also differ from typical papers on bioinformatics methods published in other journals, a mandatory presubmission stage has been introduced for the submission of Methods papers to PLOS Computational Biology. (Note that a presubmission inquiry is not mandatory for general research papers in PLOS Computational Biology, though it is also mandatory for submission to the Software papers category.) Since the Methods section was launched in October 2012, we have received 334 presubmission inquiries. For roughly half of them (159), we encouraged submission and received full papers, of which 41 papers were published, so far, as Methods papers. We find that, while many presubmission inquiries are informative enough to make an educated decision on the submission, we also receive a number of presubmission inquiries that are not sufficiently informative, such that submission may be discouraged not on the basis of the quality or scope of the paper but on that of the presubmission inquiry. We generally do not allow revisions of presubmission inquiries. In order to minimize the number of papers that fail to get a chance to be published in PLOS Computational Biology merely because of the inadequacy of the presubmission inquiry, here we give a number of suggestions for preparing such an inquiry. The goal of a presubmission inquiry is to make the statement to the Editorial Board that the paper to be submitted reasonably satisfies the criteria detailed in the scope statement. The presubmission inquiry must be detailed enough to convincingly make that point. Most of the insufficiently informative inquiries that we get are either too terse, i.e., they do not give enough detail, or they are not specific enough. Therefore, we suggest a way of structuring a presubmission inquiry. These suggestions are the result of our experience with presubmission inquiries over the past couple of years. What is the problem? Please summarize the problem domain and statement and the relevance of the problem to the general readership of the journal—in the case of PLOS Computational Biology, the biological research community or a substantial subcommunity. What is that subcommunity? How relevant is the problem to them? What is the innovation? Here it is important that you give enough detail on your contribution to allow the Editorial Board to form an image of the substance and relevance of the advance over the state of the art in the field and over your previous work. If your paper presents material that rests on or is related to your own previous publications, this entails addressing dual publication issues. In order to argue your point, you have to summarize the state of the art on which you base your contribution and give the essential ingredients of your innovation. Depending on the journal, the innovation can take different shapes: a contribution to technology or experimental design, a biological finding, a methodical or theoretical piece of work, etc. For papers in the Methods section of PLOS Computational Biology, the computational method is expected to be at the center of the innovation. This section is not for papers whose methodical core has been published elsewhere and for which you present a—possibly extended or modified—application scenario. Also, studies presenting a comparative assessment of existing methods on an application domain are not within the scope of a Methods paper. We expect a concrete and specific relationship to underlying biological issues. This is why general methods on statistical learning that find their application in biology as well as in other fields of science are typically not considered in scope, unless the paper focuses on sufficiently deep issues of the configuration of the method that are specific to biology. Finally, the method must be the major innovation of the paper. However interesting it may be, a biological finding that has been obtained with methods that are prepublished or only minor modification of prepublished methods is not within the scope of the Methods papers category. (On the other hand, it may be a suitable General research paper for the journal.) How is the method validated? Validation can take manifold forms but is a key element in most scientific papers. For a theoretical paper, the validation often takes the form of a proof. In contrast, methods in computational biology have manifold forms of validation and, usually, a single form is not sufficient to make the point. For instance, for the Methods section of PLOS Computational Biology, we expect more than an anecdotal validation based on a couple of biological use cases. A validation purely on synthetic data is not sufficient either. Rather, the validation must make a convincing argument for the general applicability of the method in a substantial biological problem domain. Please note, however, that papers that center on the validation of a method that has been published elsewhere are not considered in scope either. The paper has to contain both the method and its application. How is the method being made available? Availability of research results becomes an increasingly desired and often required aspect of a publication. For papers that are based on experimental data, making the data and the protocols of the experimental design available is a prerequisite for making the research reproducible. For methods papers, reusability of the method also becomes an issue. For the Methods section of PLOS Computational Biology, we only accept papers on methods that are useful to and can be readily applied by other scientists. The best way of satisfying this criterion is to make the software implementation of the methods openly available. For methods that are not based on software, a workable protocol for how to use the methods must be provided. If you have the full submission ready at the time of the presubmission inquiry, we encourage you to attach it to the inquiry as an optional supplement and mention that you have done this in your cover letter. However, your presubmission inquiry should be worded such that the editors do not need to inspect the complete paper for making their assessment. We wish you much success with your future submission to the Methods section of PLOS Computational Biology.
- Published
- 2015
20. New Methods Section in PLOS Computational Biology
- Author
-
Brian Y. Chen
- Subjects
Protein Structure ,Surface Properties ,Static Electricity ,Sequence alignment ,Computational biology ,Biology ,Biochemistry ,Cellular and Molecular Neuroscience ,Protein structure ,Molecular recognition ,Sequence Analysis, Protein ,Macromolecular Structure Analysis ,Genetics ,Animals ,Cluster Analysis ,Humans ,Amino Acids ,Binding site ,Molecular Biology ,lcsh:QH301-705.5 ,Macromolecular Complex Analysis ,Ecology, Evolution, Behavior and Systematics ,Binding selectivity ,Topology (chemistry) ,chemistry.chemical_classification ,Ecology ,Computational Biology ,Proteins ,Biology and Life Sciences ,Molecular Sequence Annotation ,Amino acid ,Computational Theory and Mathematics ,chemistry ,lcsh:Biology (General) ,Modeling and Simulation ,Cattle ,Algorithms ,Software ,Function (biology) ,Protein Binding ,Research Article - Abstract
Algorithms for comparing protein structure are frequently used for function annotation. By searching for subtle similarities among very different proteins, these algorithms can identify remote homologs with similar biological functions. In contrast, few comparison algorithms focus on specificity annotation, where the identification of subtle differences among very similar proteins can assist in finding small structural variations that create differences in binding specificity. Few specificity annotation methods consider electrostatic fields, which play a critical role in molecular recognition. To fill this gap, this paper describes VASP-E (Volumetric Analysis of Surface Properties with Electrostatics), a novel volumetric comparison tool based on the electrostatic comparison of protein-ligand and protein-protein binding sites. VASP-E exploits the central observation that three dimensional solids can be used to fully represent and compare both electrostatic isopotentials and molecular surfaces. With this integrated representation, VASP-E is able to dissect the electrostatic environments of protein-ligand and protein-protein binding interfaces, identifying individual amino acids that have an electrostatic influence on binding specificity. VASP-E was used to examine a nonredundant subset of the serine and cysteine proteases as well as the barnase-barstar and Rap1a-raf complexes. Based on amino acids established by various experimental studies to have an electrostatic influence on binding specificity, VASP-E identified electrostatically influential amino acids with 100% precision and 83.3% recall. We also show that VASP-E can accurately classify closely related ligand binding cavities into groups with different binding preferences. These results suggest that VASP-E should prove a useful tool for the characterization of specific binding and the engineering of binding preferences in proteins., Author Summary Proteins, the ubiquitous worker molecules of the cell, are a diverse class of molecules that perform very specific tasks. Understanding how proteins achieve specificity is a critical step towards understanding biological systems and a key prerequisite for rationally engineering new proteins. To examine electrostatic influences on specificity in proteins, this paper presents VASP-E, a software tool that generates solid representations of the electrostatic potential fields that surround proteins. VASP-E compares solids with constructive solid geometry, a class of techniques developed first for modeling complex machine parts. We observed that solid representations could quantify the degree of charge complementarity in protein-protein interactions and identify key residues that strengthen or weaken them. VASP-E correctly identified amino acids with established experimental influences on protein-protein binding specificity. We also observed that solid representations of electrostatic fields could identify electrostatic conservations and variations that relate to similarities and differences in binding specificity between proteins and small molecules.
- Published
- 2014
21. ISMB 2014--the premier conference for the World's Computational Biologists
- Author
-
Diane E. Kovats and Christiana N. Fogg
- Subjects
Operations research ,Computer science ,media_common.quotation_subject ,Library science ,Message from ISCB ,Session (web analytics) ,Exhibition ,Cellular and Molecular Neuroscience ,Presentation ,Senior Scientist Award ,Genetics ,Molecular Biology ,Curriculum ,lcsh:QH301-705.5 ,Ecology, Evolution, Behavior and Systematics ,media_common ,Ecology ,business.industry ,Biology and Life Sciences ,Computational Biology ,Special Interest Group ,Computational Theory and Mathematics ,lcsh:Biology (General) ,Modeling and Simulation ,Electronic publishing ,business ,Associate professor - Abstract
The 22nd Annual International Conference on Intelligent Systems for Molecular Biology (ISMB) will be a world-class scientific meeting that brings together computational biologists and bioinformaticians of every career stage from diverse scientific disciplines. ISMB 2014 will convene at the John B. Hynes Memorial Convention Center in Boston, Massachusetts, on July 11–15, 2014. ISMB is the flagship conference of the International Society for Computational Biology (ISCB). This unique meeting draws scientists from a broad range of fields that use computational biology and bioinformatics, including sequence analysis, comparative genomics, proteomics, structural biology, data mining, and systems biology. ISMB 2014 is anchored by six keynote presentations from world-renowned scientists. Isaac “Zak” Kohane, the Director of the Children's Hospital Informatics Program and the Henderson Professor of Pediatrics and Health Sciences and Technology at Harvard Medical School, will be speaking on Sunday, July 13. Kohane's unique background in both pediatric endocrinology and computer science has enabled him to develop a research program that uses genomics to better understand the genetic basis of diseases, including autism and cancer. He has also developed computer systems that permit the use of information from electronic health records for genetic studies while maintaining patient privacy. Sunday, July 13, will also feature a keynote presentation by Eugene “Gene” Myers, the 2014 recipient of the ISCB Accomplishment by a Senior Scientist Award. This award honors luminaries in the fields of computational biology and bioinformatics who have made significant contributions to these areas through research, education, and service. Myers is the Director and Tschira Chair of Systems Biology at the Max Planck Institute of Molecular Biology and Genetics in Dresden, Germany. Myers is well known for his work on developing the BLAST algorithm for sequence comparison, as well as his work on using shotgun sequencing to sequence the human genome at Celera Genomics. His research is now focused on computational bioimaging. He has developed new microscopic devices and software that are used for building 3D biological models, and these tools are providing unparalleled insights into the inner workings of cells and systems. Michal Linial, a Professor of Biochemistry, Molecular Biology, and Bioinformatics at the Hebrew University of Jerusalem, Israel, will be a keynote speaker on Monday, July 14. Linial is the Director of the Sudarsky Center for Computational Biology and is the first female head of the Israel Institute for Advanced Studies. Her broad research activities encompass both “wet lab” projects and computational modeling, with particular interests in neuronal cell differentiation and synapse formation, proteomic analysis of membrane proteins, and functional genomics. The 2014 winner of the Overton Prize, Dana Pe'er, is featured as a keynote speaker on Monday, July 14, as well. The Overton Prize recognizes early- or mid-career scientists working in computational biology or bioinformatics who are rising leaders in these fields. Pe'er is an Associate Professor in the Department of Biological Sciences and Systems Biology at Columbia University. Her research focuses on understanding cellular and molecular networks at a holistic level by using computational approaches to analyze complex data sets. Robert Langer, a Professor in the Department of Chemical Engineering at the Massachusetts Institute of Technology, will give a keynote presentation on Tuesday, July 15. Langer is a prolific researcher who works on developing novel drug-delivery systems, with a particular interest in using polymers to deliver therapeutic molecules like DNA and genetically engineered proteins. Langer's innovative work was recognized most recently when he was selected as a recipient of the 2014 Breakthrough Prize in Fundamental Physics and Life Sciences. The last keynote presentation will be given on Tuesday, July 15, by Russ Altman, a Professor of Bioengineering, Genetics, and Medicine, and Computer Science. Altman has been selected as this year's ISCB Fellows Keynote Speaker. He works on building and applying new algorithms to explore diverse topics including RNA structure, how drug efficacy is impacted by genomics, and how to model motion and dynamics of biological structures. Beyond the keynote speakers, ISMB will be brimming with talks on cutting-edge discoveries across diverse areas. The Special Sessions track will run throughout the meeting and will feature hot topics that have not been featured in previous ISMB meetings. The Highlights and Proceedings tracks are also popular conference tracks that include oral presentations based on recently published papers selected through rigorous peer-review processes. The Proceedings papers are also published as an online-only open-access section of the Bioinformatics journal. The Technology track features presentations that showcase the use of novel software or hardware relevant to computational biologists. The Late Breaking Research track will also feature talks on a wide range of topics of significant interest to the bioinformatics community. A large poster session will provide an opportunity for trainees and scientists from every career stage to present their latest research findings in a collegial and collaborative atmosphere. “Birds of a Feather” sessions and workshops will be more informal sessions that encourage discussion and collaboration. These sessions will feature such themes as bioinformatics curriculum guidelines, personalized medicine, bioinformatics core facility management, trends in digital publishing, and data analysis. The exhibit hall will showcase a wide variety of organizations and companies that are developing tools and reagents relevant to computational biologists and bioinformaticians, and attendees will be able to see some of these items in action at exhibitor presentations. The ISCB Student Council will be organizing several high profile events throughout ISMB 2014. The annual Student Council Symposium will convene just prior to ISMB 2014 and will include talks by a keynote speaker and student presenters, as well as a poster session. Opportunities for career guidance and social events are also included. In addition, the ISCB Student Council will be coordinating an Art & Science Exhibition during the ISMB meeting that will feature images and videos of scientific material derived from research projects or artwork generated from scientific tools or methods. Saturday, July 12, and Sunday, July 13, will be filled with substantive one- and two-day specialized meetings that precede the main ISMB meeting. Special Interest Group (SIG) and Satellite meetings will be focused on a range of topics that include but are not limited to structural bioinformatics, mass spectrometry, and regulatory genomics. Two half-day tutorial sessions will also be held on July 12 and will feature (1) Computational Metagenomics and (2) Wikipedia: WikiProject Computational Biology. Several social events will balance out the program for ISMB 2014 and will create ample opportunities for attendees to gather together in informal settings. An opening reception is scheduled for the evening of Saturday, July 12, and poster viewing receptions are being held on both Sunday, July 13, and Monday, July 14. A World Cup viewing area will also be set up in the Exhibit Hall. As a long-standing hub of biological and computational research breakthroughs, Boston promises to be an excellent host to ISMB 2014. Both local Boston- and Cambridge-area scientists, as well as visitors from every corner of the globe, will be showcasing diverse topics that span from personalized medicine, to machine learning in systems biology, to open-source bioinformatics software development. This must-see event has something for everyone and is an excellent destination to start your next collaboration.
- Published
- 2014
22. International Society for Computational Biology Honors David Eisenberg with 2013 Accomplishment by a Senior Scientist Award
- Author
-
Christiana N. Fogg and Diane E. Kovats
- Subjects
Ecology ,Computer science ,media_common.quotation_subject ,Computational Biology ,Subject (documents) ,Computational biology ,Humanism ,Message from ISCB ,Cellular and Molecular Neuroscience ,Amyloid disease ,Senior Scientist Award ,Senior Scientist ,lcsh:Biology (General) ,Computational Theory and Mathematics ,Undergraduate research ,Modeling and Simulation ,Genetics ,Curiosity ,lcsh:QH301-705.5 ,Molecular Biology ,Biology ,Ecology, Evolution, Behavior and Systematics ,Graduation ,media_common - Abstract
The International Society for Computational Biology (ISCB; http://www.iscb.org) honors a senior scientist each year for his or her outstanding achievements. The ISCB Accomplishment by a Senior Scientist recognizes a leading member of the computational biology community for his or her significant contributions to the field through research, service, and training. The 2013 ISCB Accomplishment by a Senior Scientist Award honors Dr. David Eisenberg of the University of California Los Angeles (UCLA). Dr. Eisenberg (Image 1) was selected the by ISCB's awards committee, which is chaired by Dr. Alfonso Valencia of the Spanish National Cancer Research Center (CNIO) in Madrid. Dr. Eisenberg will receive his awards and deliver a keynote address at the ISCB's 21st annual Intelligent Systems for Molecular Biology (ISMB) meeting. This meeting is being held jointly with the 12th European Conference on Computational Biology and will take place in Berlin, Germany on July 19–23, 2013 (http://www.iscb.org/ismbeccb2013). Image 1. David Eisenberg, UCLA. 2013 ISCB Accomplishment by a Senior Scientist Award: David Eisenberg David Eisenberg's love of medicine and science was cultivated first during his childhood by his father, a gentle and beloved pediatrician. Eisenberg recalled, “Every night after dinner he would make house calls. I saw how appreciated—even loved—he was in our village.” Eisenberg's father also stoked his scientific curiosity by encouraging him to try some experiments in their basement, including attempts to petrify an egg and to grow worms in chocolate. Eisenberg reminisced, “None of these [experiments] worked, but they were fun!” Eisenberg strongly considered following in his father's footsteps and pursuing a career in medicine. With that goal in mind, he focused his undergraduate studies on biochemical sciences at Harvard University. As a sophomore, he was assigned to Dr. John T. Edsall as a tutor. Edsall was a pioneering researcher in the field of biophysical chemistry, and under his guidance, Eisenberg had his first encounter with laboratory research. “In my junior year, he assigned me to read scientific papers, most of which baffled me, and at the end of that year, I started a research project in his lab, which became the subject of my senior thesis,” Eisenberg recounted. “After graduation, Dr. Edsall turned my thesis into a short paper which was published in Science.” In spite of Eisenberg's eye-opening undergraduate research experiences, he applied and was accepted to medical school. Edsall was also trained as a medical doctor, but Eisenberg remembered how “Dr. Edsall convinced me that if my goal was to improve the health of mankind, I might have a greater impact working in biochemistry, than as a practicing physician.” Eisenberg took Edsall's advice to heart and “finessed making an immediate choice by going to Oxford to study theoretical chemistry under Dr. Charles Coulson, one of the founders of quantum chemistry.” Edsall's guidance had also given him a strong foundation in math and physics, which served him well as a graduate student at Oxford as he recalled being “(just) able to work with Coulson on the energetics of hydrogen bonding.” Eisenberg's postdoctoral studies took him to Princeton in 1964 to work with Dr. Walter Kauzmann, well known for his discovery of the hydrophobic interaction. Eisenberg recollected his ambitious postdoctoral plan “to compute the energy of the hydrophobic interaction in myoglobin, the first protein with a known 3D structure. This plan now seems hopelessly naive: computers were not yet up to such a calculation, potential functions and theory had not advanced to the point that this was a practical problem, and the early protein crystallographers were not eager to release their atomic coordinates.” In light of these challenges, Eisenberg's work with Kauzmann culminated in “a monograph on ice and water, which, incidentally, is still in print 44 years later.” His failed postdoctoral research plan also opened his eyes. He knew that if he wanted to pursue protein energetics, which required knowing protein coordinates, he had to learn X-ray crystallography. Eisenberg's next postdoc took him “to Caltech to study X-ray crystallography with Richard Dickerson, who had been part of the team who had determined the structure of myoglobin.” His X-ray crystallography training was pivotal to establishing his own lab at UCLA, which focused on studying diverse protein structures. Melittin, a component of bee venom, was one of the first structures he determined with his then graduate student Tom Terwilliger. Eisenberg vividly recalled that, “At last I was able to get down to energetic calculations on a protein, and came up with the idea of the hydrophobic moment. This and related ideas gave me, for the first time, the feeling that I could make discoveries.” Eisenberg also remembers the excitement of solving the structure of diphtheria toxin dimer, which he worked on with John Collier, Senyon Choe, and Melanie Bennett (Brewer). He recalled the excitement that stemmed from Bennett's observation that “two monomers of the dimer swapped their third domains, and we called this phenomenon “3D domain swapping.” We explored the implications of 3D domain swapping, again calling on my background in energetics. Diphtheria toxin was the first structural example of 3D domain swapping; now there are hundreds.” Eisenberg's work on protein structures awakened his interest in how protein sequences related to 3D structures. While on sabbatical at the Laboratory of Molecular Biology in Cambridge, he worked with Andrew McLachlan and Mike Gribskov to develop methods to examine protein sequences and use profile analysis to predict the presence of potential structural motifs. These studies led to his work on 3D profiles with Jim Bowie and Roland Luethy, which Eisenberg has now seen “applied to many protein problems.” Burkhard Rost, president of the ISCB, considers Eisenberg's work on hydrophobicity profiling as groundbreaking because it “describes an important feature of the constituents of proteins (amino acids), namely their preferences to stay away from the solvent water (hydro = water, phobie = animosity). Many other outstanding, original methods followed for the prediction of protein structure and function; many of those methods were so visionary that they started entire fields of research.” The availability of the first complete genome sequences in the late 1990s inspired Eisenberg's work with “colleague Todd Yeates and our two talented postdoctoral fellows Edward Marcotte and Matteo Pellegrini, [in which] we found we could extract information on protein interactions from sequenced genomes.” These cutting edge studies resulted in several publications that showed how protein function and protein-protein interactions could be predicted from genome sequences. Eisenberg has focused his research over the last decade on studying amyloid-forming proteins. Several neurodegenerative diseases are associated with amyloid-forming proteins, including Alzheimer's, Parkinson's and amyotrophic lateral sclerosis (Lou Gehrig's) disease. “Just before the turn of the century, I realized that amyloid diseases represent the greatest unmet medical problem facing the world,” Eisenberg recounted. “And at the same time, I realized that structural and computational biology, which have illuminated other areas of biomedicine so well, have not been widely applied to the fundamental problems of amyloid disease. In particular, there had been almost no single crystal X-ray studies of amyloid-forming proteins.” The use of computational biology with this structural data has helped support the definition of the “amyloid state” of proteins. “Bioinformatics and computational biology are great partners with structural biology. Using the tools together can be surprisingly powerful,” said Eisenberg. Eisenberg's group has studied the structural basis of how normal proteins convert to amyloid fibrils. They have gained great insight into this conversion process by determining the atomic structures of the spines of many different types of amyloid fibrils. Eisenberg also acknowledges that, “Having several friends afflicted with amyloid disorders is a great inspiration. I would love to be able to help them, and others. If we can, it would validate Dr. Edsall's advice that sometimes biochemists can do as much, or more, to help mankind than physicians.” Eisenberg remains humble about his accomplishments. When asked about being the recipient of the ISCB Senior Scientist Accomplishment Award, he felt “honored, but perhaps over-honored. There are many others who are equally, or more, deserving of this recognition.” But he also recognizes that this award helps highlight the importance of studying amyloid diseases, especially by using the tools of computational biology. Eisenberg speaks warmly of the mentors that have guided and shaped his scientific training. “I was enormously fortunate to find myself in the research groups of four great mentors: John Edsall, Charles Coulson, Walter Kauzmann, and Richard Dickerson, not to mention my father. All were creative scientists, and also humanists. Watching them I saw their pleasure in scientific discovery, and also saw their insistence on fairness to all those involved in the process of science.” Their examples have not only served him well as a scientist, but also as a mentor. Eisenberg delights in working with trainees because he loves “their eagerness to learn and to succeed, and their willingness to think freshly about hard problems.” Eisenberg's scientific curiosity remains insatiable, and when asked for advice to motivate young scientists, his sage answer was “work on fundamental problems, maintain your curiosity, and above all, persevere.”
- Published
- 2013
23. Network-based Survival Analysis Reveals Subnetwork Signatures for Predicting Outcomes of Ovarian Cancer Treatment
- Author
-
Jeremy Chien, Wei Zhang, Baolin Wu, Viji Shridhar, Takayo Ota, and Rui Kuang
- Subjects
Microarrays ,Fibrillin-1 ,Gene regulatory network ,Bioinformatics ,0302 clinical medicine ,Ovarian carcinoma ,Protein Interaction Maps ,lcsh:QH301-705.5 ,Ovarian Neoplasms ,0303 health sciences ,Ecology ,Systems Biology ,Statistics ,Microfilament Proteins ,Genomics ,3. Good health ,Functional Genomics ,Treatment Outcome ,Computational Theory and Mathematics ,030220 oncology & carcinogenesis ,Modeling and Simulation ,Biomarker (medicine) ,Female ,Research Article ,Computational biology ,Biology ,Fibrillins ,03 medical and health sciences ,Cellular and Molecular Neuroscience ,Genetics ,medicine ,Biomarkers, Tumor ,Humans ,Statistical Methods ,Molecular Biology ,Ecology, Evolution, Behavior and Systematics ,Survival analysis ,030304 developmental biology ,Proportional Hazards Models ,Proportional hazards model ,Gene Expression Profiling ,Cancer ,Computational Biology ,Reproducibility of Results ,medicine.disease ,Survival Analysis ,Gene expression profiling ,lcsh:Biology (General) ,Neoplasm Recurrence, Local ,Ovarian cancer ,Mathematics - Abstract
Cox regression is commonly used to predict the outcome by the time to an event of interest and in addition, identify relevant features for survival analysis in cancer genomics. Due to the high-dimensionality of high-throughput genomic data, existing Cox models trained on any particular dataset usually generalize poorly to other independent datasets. In this paper, we propose a network-based Cox regression model called Net-Cox and applied Net-Cox for a large-scale survival analysis across multiple ovarian cancer datasets. Net-Cox integrates gene network information into the Cox's proportional hazard model to explore the co-expression or functional relation among high-dimensional gene expression features in the gene network. Net-Cox was applied to analyze three independent gene expression datasets including the TCGA ovarian cancer dataset and two other public ovarian cancer datasets. Net-Cox with the network information from gene co-expression or functional relations identified highly consistent signature genes across the three datasets, and because of the better generalization across the datasets, Net-Cox also consistently improved the accuracy of survival prediction over the Cox models regularized by or . This study focused on analyzing the death and recurrence outcomes in the treatment of ovarian carcinoma to identify signature genes that can more reliably predict the events. The signature genes comprise dense protein-protein interaction subnetworks, enriched by extracellular matrix receptors and modulators or by nuclear signaling components downstream of extracellular signal-regulated kinases. In the laboratory validation of the signature genes, a tumor array experiment by protein staining on an independent patient cohort from Mayo Clinic showed that the protein expression of the signature gene FBN1 is a biomarker significantly associated with the early recurrence after 12 months of the treatment in the ovarian cancer patients who are initially sensitive to chemotherapy. Net-Cox toolbox is available at http://compbio.cs.umn.edu/Net-Cox/., Author Summary Network-based computational models are attracting increasing attention in studying cancer genomics because molecular networks provide valuable information on the functional organizations of molecules in cells. Survival analysis mostly with the Cox proportional hazard model is widely used to predict or correlate gene expressions with time to an event of interest (outcome) in cancer genomics. Surprisingly, network-based survival analysis has not received enough attention. In this paper, we studied resistance to chemotherapy in ovarian cancer with a network-based Cox model, called Net-Cox. The experiments confirm that networks representing gene co-expression or functional relations can be used to improve the accuracy and the robustness of survival prediction of outcome in ovarian cancer treatment. The study also revealed subnetwork signatures that are enriched by extracellular matrix receptors and modulators and the downstream nuclear signaling components of extracellular signal-regulators, respectively. In particular, FBN1, which was detected as a signature gene of high confidence by Net-Cox with network information, was validated as a biomarker for predicting early recurrence in platinum-sensitive ovarian cancer patients in laboratory.
- Published
- 2013
24. Sustained Firing of Model Central Auditory Neurons Yields a Discriminative Spectro-temporal Representation for Natural Sounds
- Author
-
Mounya Elhilali and Michael A. Carlin
- Subjects
Male ,Computer science ,Speech recognition ,Audio Signal Processing ,0302 clinical medicine ,Engineering ,Cluster Analysis ,Natural sounds ,lcsh:QH301-705.5 ,Neurons ,0303 health sciences ,education.field_of_study ,Coding Mechanisms ,Ecology ,Sensory Systems ,medicine.anatomical_structure ,Computational Theory and Mathematics ,Modeling and Simulation ,Auditory Perception ,Female ,Research Article ,Population ,Models, Neurological ,Sensory system ,Stimulus (physiology) ,Auditory cortex ,03 medical and health sciences ,Cellular and Molecular Neuroscience ,Neuronal tuning ,Genetics ,medicine ,Auditory system ,Animals ,Humans ,Speech ,education ,Molecular Biology ,Biology ,Ecology, Evolution, Behavior and Systematics ,030304 developmental biology ,Computational Neuroscience ,Auditory Cortex ,Quantitative Biology::Neurons and Cognition ,Ferrets ,Computational Biology ,lcsh:Biology (General) ,Acoustic Stimulation ,Receptive field ,Signal Processing ,Vocalization, Animal ,Noise ,030217 neurology & neurosurgery ,Neuroscience - Abstract
The processing characteristics of neurons in the central auditory system are directly shaped by and reflect the statistics of natural acoustic environments, but the principles that govern the relationship between natural sound ensembles and observed responses in neurophysiological studies remain unclear. In particular, accumulating evidence suggests the presence of a code based on sustained neural firing rates, where central auditory neurons exhibit strong, persistent responses to their preferred stimuli. Such a strategy can indicate the presence of ongoing sounds, is involved in parsing complex auditory scenes, and may play a role in matching neural dynamics to varying time scales in acoustic signals. In this paper, we describe a computational framework for exploring the influence of a code based on sustained firing rates on the shape of the spectro-temporal receptive field (STRF), a linear kernel that maps a spectro-temporal acoustic stimulus to the instantaneous firing rate of a central auditory neuron. We demonstrate the emergence of richly structured STRFs that capture the structure of natural sounds over a wide range of timescales, and show how the emergent ensembles resemble those commonly reported in physiological studies. Furthermore, we compare ensembles that optimize a sustained firing code with one that optimizes a sparse code, another widely considered coding strategy, and suggest how the resulting population responses are not mutually exclusive. Finally, we demonstrate how the emergent ensembles contour the high-energy spectro-temporal modulations of natural sounds, forming a discriminative representation that captures the full range of modulation statistics that characterize natural sound ensembles. These findings have direct implications for our understanding of how sensory systems encode the informative components of natural stimuli and potentially facilitate multi-sensory integration., Author Summary We explore a fundamental question with regard to the representation of sound in the auditory system, namely: what are the coding strategies that underlie observed neurophysiological responses in central auditory areas? There has been debate in recent years as to whether neural ensembles explicitly minimize their propensity to fire (the so-called sparse coding hypothesis) or whether neurons exhibit strong, sustained firing rates when processing their preferred stimuli. Using computational modeling, we directly confront issues raised in this debate, and our results suggest that not only does a sustained firing strategy yield a sparse representation of sound, but the principle yields emergent neural ensembles that capture the rich structural variations present in natural stimuli. In particular, spectro-temporal receptive fields (STRFs) have been widely used to characterize the processing mechanisms of central auditory neurons and have revealed much about the nature of sound processing in central auditory areas. In our paper, we demonstrate how neurons that maximize a sustained firing objective yield STRFs akin to those commonly measured in physiological studies, capturing a wide range of aspects of natural sounds over a variety of timescales, suggesting that such a coding strategy underlies observed neural responses.
- Published
- 2013
25. Dynamic modeling of cell migration and spreading behaviors on fibronectin coated planar substrates and micropatterned geometries
- Author
-
H. Harry Asada, Devin Neal, Mincheol Kim, Roger D. Kamm, Massachusetts Institute of Technology. Department of Mechanical Engineering, Massachusetts Institute of Technology. School of Engineering, Singapore-MIT Alliance in Research and Technology (SMART), Kim, Min-Cheol, Neal, Devin M., Kamm, Roger Dale, and Asada, Harry
- Subjects
Biophysics Simulations ,Cell membrane ,0302 clinical medicine ,Cell Movement ,Cricetinae ,Stress Fibers ,Biomechanics ,Pseudopodia ,Cytoskeleton ,lcsh:QH301-705.5 ,0303 health sciences ,Ecology ,biology ,Systems Biology ,Cell migration ,Biomechanical Phenomena ,Cell biology ,medicine.anatomical_structure ,Computational Theory and Mathematics ,Modeling and Simulation ,Biophysic Al Simulations ,Lamellipodium ,Research Article ,Materials science ,Nuclear Envelope ,Cytological Techniques ,Integrin ,Biophysics ,CHO Cells ,Models, Biological ,Focal adhesion ,03 medical and health sciences ,Cellular and Molecular Neuroscience ,Cricetulus ,Genetics ,medicine ,Animals ,Computer Simulation ,Biology ,Molecular Biology ,Ecology, Evolution, Behavior and Systematics ,030304 developmental biology ,Focal Adhesions ,Cell Membrane ,Computational Biology ,Fibronectins ,Fibronectin ,lcsh:Biology (General) ,biology.protein ,030217 neurology & neurosurgery - Abstract
An integrative cell migration model incorporating focal adhesion (FA) dynamics, cytoskeleton and nucleus remodeling, actin motor activity, and lamellipodia protrusion is developed for predicting cell spreading and migration behaviors. This work is motivated by two experimental works: (1) cell migration on 2-D substrates under various fibronectin concentrations and (2) cell spreading on 2-D micropatterned geometries. These works suggest (1) cell migration speed takes a maximum at a particular ligand density (∼1140 molecules/µm2) and (2) that strong traction forces at the corners of the patterns may exist due to combined effects exerted by actin stress fibers (SFs). The integrative model of this paper successfully reproduced these experimental results and indicates the mechanism of cell migration and spreading. In this paper, the mechanical structure of the cell is modeled as having two elastic membranes: an outer cell membrane and an inner nuclear membrane. The two elastic membranes are connected by SFs, which are extended from focal adhesions on the cortical surface to the nuclear membrane. In addition, the model also includes ventral SFs bridging two focal adhesions on the cell surface. The cell deforms and gains traction as transmembrane integrins distributed over the outer cell membrane bond to ligands on the ECM surface, activate SFs, and form focal adhesions. The relationship between the cell migration speed and fibronectin concentration agrees with existing experimental data for Chinese hamster ovary (CHO) cell migrations on fibronectin coated surfaces. In addition, the integrated model is validated by showing persistent high stress concentrations at sharp geometrically patterned edges. This model will be used as a predictive model to assist in design and data processing of upcoming microfluidic cell migration assays., Author Summary Cell migration is a complex, multifaceted process, triggered by chemotaxis and haptotatic responses from the extracellular matrix (ECM). It is triggered by a thin lamellipodium protrusion at the leading edge, followed by the assembly of a number of focal adhesions between the lamellipodium base and the ECM. Afterwards, actin stress fibers extend from nascent focal adhesions, some of which connect to the nucleus. In this work, we have developed a dynamic model of cell migration incorporating these four mechanisms of cell biology, such as remodeling of cell and nuclear membranes, focal adhesion dynamics, actin motor activity, and lamellipodia protrusion at the leading edge. We successfully compared our model with existing experimental works of cell migration on (1) substrates with various fibronectin coating concentrations, and (2) cell spreading on three patterned surfaces. Finally, our model demonstrates how actin stress fibers anchored at the trailing edge play a key role, leading to an increase in cell migration speed. Thereby, the model will not only provide new insights on better building such an experiment, but also further experiments will allow us to better validate the model.
- Published
- 2013
26. ADEMA: an algorithm to determine expected metabolite level alterations using mutual information
- Author
-
Gultekin Ozsoyoglu, Leigh Henderson, A. Ercument Cicek, Mitchell L. Drumm, and Ilya Bederman
- Subjects
Multivariate statistics ,Multivariate analysis ,Metabolite ,Metabolic network ,Computational biology ,Biology ,Bioinformatics ,03 medical and health sciences ,Cellular and Molecular Neuroscience ,chemistry.chemical_compound ,Metabolic Networks ,Mice ,0302 clinical medicine ,Metabolomics ,Genetics ,Animals ,Molecular Biology ,lcsh:QH301-705.5 ,Ecology, Evolution, Behavior and Systematics ,030304 developmental biology ,0303 health sciences ,Ecology ,Lipogenesis ,Computational Biology ,Mutual information ,Models, Theoretical ,Omics ,Statistical classification ,Computational Theory and Mathematics ,chemistry ,lcsh:Biology (General) ,030220 oncology & carcinogenesis ,Modeling and Simulation ,Computer Science ,Multivariate Analysis ,Algorithms ,Research Article - Abstract
Metabolomics is a relatively new “omics” platform, which analyzes a discrete set of metabolites detected in bio-fluids or tissue samples of organisms. It has been used in a diverse array of studies to detect biomarkers and to determine activity rates for pathways based on changes due to disease or drugs. Recent improvements in analytical methodology and large sample throughput allow for creation of large datasets of metabolites that reflect changes in metabolic dynamics due to disease or a perturbation in the metabolic network. However, current methods of comprehensive analyses of large metabolic datasets (metabolomics) are limited, unlike other “omics” approaches where complex techniques for analyzing coexpression/coregulation of multiple variables are applied. This paper discusses the shortcomings of current metabolomics data analysis techniques, and proposes a new multivariate technique (ADEMA) based on mutual information to identify expected metabolite level changes with respect to a specific condition. We show that ADEMA better predicts De Novo Lipogenesis pathway metabolite level changes in samples with Cystic Fibrosis (CF) than prediction based on the significance of individual metabolite level changes. We also applied ADEMA's classification scheme on three different cohorts of CF and wildtype mice. ADEMA was able to predict whether an unknown mouse has a CF or a wildtype genotype with 1.0, 0.84, and 0.9 accuracy for each respective dataset. ADEMA results had up to 31% higher accuracy as compared to other classification algorithms. In conclusion, ADEMA advances the state-of-the-art in metabolomics analysis, by providing accurate and interpretable classification results., Author Summary Metabolomics is an experimental approach that analyzes differences in metabolite levels detected in experimental samples. It has been used in the literature to understand the changes in metabolism with respect to diseases or drugs. Unlike transcriptomics or proteomics, which analyze gene and protein expression levels respectively, the techniques that consider co-regulation of multiple metabolites are quite limited. In this paper, we propose a novel technique, called ADEMA, which computes the expected level changes for each metabolite with respect to a given condition. ADEMA considers multiple metabolites at the same time and is mutual information (MI)-based. We show that ADEMA predicts metabolite level changes for young mice with Cystic Fibrosis (CF) better than significance testing that considers one metabolite at a time. Using three different datasets that contain CF and wild-type (WT) mice, we show that ADEMA can classify an individual as being CF or WT based on the metabolic profiles (with 1.0, 0.84, and 0.9 accuracy, respectively). Compared to other well-known classification algorithms, ADEMA's accuracy is higher by up to 31%.
- Published
- 2013
27. Education in computational biology today and tomorrow
- Author
-
Joanne A. Fox and B. F. Francis Ouellette
- Subjects
Computer science ,Best practice ,media_common.quotation_subject ,Computational biology ,Science education ,Education ,Cellular and Molecular Neuroscience ,ComputingMilieux_COMPUTERSANDEDUCATION ,Genetics ,Molecular Biology ,lcsh:QH301-705.5 ,Ecology, Evolution, Behavior and Systematics ,Pace ,media_common ,Enthusiasm ,Ecology ,Computational Biology ,New media ,Outreach ,Editorial ,Workflow ,Computational Theory and Mathematics ,lcsh:Biology (General) ,Modeling and Simulation ,Associate professor ,Forecasting - Abstract
The etymology of the word “education” in Wikipedia is enlightening: “a rearing” and “I lead forth” (http://en.wikipedia.org/wiki/Education#Etymology). Computational biology educators are leading and raising the next generation of scientists and, in doing so, are in need of new tools, methods, and approaches. The need for education in science, and in computational biology in particular, is greater than ever. Large datasets, -omics technologies, and overlapping domains permeate many of the big research questions of our day. PLOS Computational Biology originally created the Education section to highlight the importance of education in the field [1]. Thus, it was a great honor when Fran Lewitter, Education Editor for the past eight years, along with Philip E. Bourne and Ruth Nussinov, contacted us to work as editors of the PLOS Computational Biology Education section. In our minds, educational initiatives in computational biology and bioinformatics serve two important goals: to communicate digital biology with each other, and to educate others on how best to do this. These are themes we practice as educators in our university teaching, in our involvement with the bioinfomatics.ca workshops series, and in our outreach efforts. We are very excited to continue Fran's great vision as we continue her work with the PLOS Computational Biology staff. Examples of tutorials, specialized workshops, and outreach programs that bridge the knowledge gap created by this fast pace of research have been previously highlighted in this collection. There have been several types of articles, but two stand out. Firstly, there are tutorials about a specific biological problem requiring a specific approach, tools, and databases. For example, ”Practical Strategies for Discovering Regulatory DNA Sequence Motifs„ by MacIsaak and Fraenkel [2]. Tutorial articles provide theoretical context, as well as the type of questions and how to answer them. The other type of article we frequently find in the Education collection are “primers” or “quick guides.” For example, Eglen's “A Quick Guide to Teaching R Programming to Computational Biology Students” [3] or Bassi's “A Primer on Python for Life Science Researchers” [4]. Both of these examples from the Education collection address an important niche within the community. The “Quick Guide” series provides a more generic introduction to an approach in computational biology that can be applied across multiple domains. All of these types of articles will continue to be well-supported and encouraged in the Education collection. Many other articles have also been well-received, and seem to address gaps in the education material. We want to revisit older collection papers and identify where methods and technologies have evolved to a point where new methods are now in use, and invite previous or new authors to contribute. These initiatives help to extend computational biology beyond the domain of specialized laboratories. Researchers, at all levels, need to keep themselves up-to-date with the quickly changing world of computational biology, and trainees need programs where bioinformatics skills are embedded so they can have comprehensive training. New bioinformatics workflows can be adopted more widely if education efforts keep pace. As previously pointed out [5], starting early is also very important. There is still room for programs that capture the excitement and enthusiasm of secondary school students and convey the potential of computational biology to the public. We welcome additions to the PLOS Computational Biology “Bioinformatics: Starting Early” collection (www.ploscollections.org/cbstartingearly). We would like to involve the community in this endeavor. With this editorial, we are calling out to educators and researchers who have experience in teaching, specifically, those keen to raise the expectations and the inquisitiveness of the next generation of biologists. The Education collection will continue to publish leading edge education materials in the form of tutorials that can be used in a “classroom” setting (whatever that may mean nowadays: stated more generically, “the places where people learn”). We will continue to encourage articles set in the context of addressing a particular biological question and, as mentioned above, we welcome new “primers” and “quick guides.” We will also be inviting tutorials from the various computational meetings. A new category of papers that is in the pipeline for the Education collection is the “Quick Tips” format, the first of which was just published [6]. The “Quick Tips” articles address specific tools or databases that are in wide use in the community. We also hope, and plan, to incorporate new thinking and perspectives in the greater field of education of computational biology and bioinformatics. For example, articles that highlight the use of new tools such as those used in cloud computing or methods for using third and fourth generation sequencing technologies are encouraged. We would also like to see articles that incorporate best practices in teaching, including the use of new media, flexible online teaching tools, and the use and re-use of large well-defined data sets that are computed on in classes, courses, and programs. We encourage articles that highlight new types of training initiatives, the use of workflows to help students in the path to reproducibility in science, and open course materials (open lecture notes and open course notes and datasets for exercises) that reach more learners. In the end, the Education section belongs to the community and thus comes with responsibilities. We need to identify the gaps and the material with which we want to educate ourselves; we need to recognize and encourage great teachers and writers to communicate openly about what works best with the specific methods. We invite you to contact us via gro.solp@loibpmocsolp with your ideas for the kind of articles you would like to see in the PLOS Computational Biology Education section. We hope to see you in the classroom soon, where we learn together. About The Authors Joanne A. Fox (xofnosilaennaoj@ on Twitter) has a PhD in Genetics from the University of British Columbia (UBC). As a faculty member at the Michael Smith Laboratories and in the Department of Microbiology and Immunology at UBC, she is involved in a range of education and outreach initiatives at the undergraduate and secondary school levels, and teaches a variety of courses. She is a former instructor and current review committee member of the Canadian Bioinformatics.ca Workshops. B.F. Francis Ouellette (offb@ on Twitter) did his graduate studies in Developmental Biology and is now an Associate Professor in Cell and Systems Biology at the University of Toronto, as well as a senior scientist and Associate Director of Informatics and Biocomputing at the Ontario Institute for Cancer Research. He was one of the founders and is still the scientific director and an instructor for the Canadian Bioinformatics.ca Workshops. The authors have worked together in the past, and have known each other for more than 15 years.
- Published
- 2013
28. Task-based core-periphery organization of human brain dynamics
- Author
-
Danielle S. Bassett, Peter J. Mucha, Scott T. Grafton, Nicholas F. Wymbs, Mason A. Porter, and M. Puck Rombach
- Subjects
Computer science ,Brain activity and meditation ,Bioinformatics ,Association (object-oriented programming) ,Mathematical Sciences ,03 medical and health sciences ,Cellular and Molecular Neuroscience ,0302 clinical medicine ,Information and Computing Sciences ,Task Performance and Analysis ,Genetics ,medicine ,Humans ,Learning ,Molecular Biology ,lcsh:QH301-705.5 ,Ecology, Evolution, Behavior and Systematics ,Motor skill ,030304 developmental biology ,0303 health sciences ,Ecology ,medicine.diagnostic_test ,Artificial neural network ,Control reconfiguration ,Brain ,Cognition ,Human brain ,Biological Sciences ,medicine.anatomical_structure ,Computational Theory and Mathematics ,lcsh:Biology (General) ,Modeling and Simulation ,Functional magnetic resonance imaging ,Neuroscience ,030217 neurology & neurosurgery ,Research Article - Abstract
As a person learns a new skill, distinct synapses, brain regions, and circuits are engaged and change over time. In this paper, we develop methods to examine patterns of correlated activity across a large set of brain regions. Our goal is to identify properties that enable robust learning of a motor skill. We measure brain activity during motor sequencing and characterize network properties based on coherent activity between brain regions. Using recently developed algorithms to detect time-evolving communities, we find that the complex reconfiguration patterns of the brain's putative functional modules that control learning can be described parsimoniously by the combined presence of a relatively stiff temporal core that is composed primarily of sensorimotor and visual regions whose connectivity changes little in time and a flexible temporal periphery that is composed primarily of multimodal association regions whose connectivity changes frequently. The separation between temporal core and periphery changes over the course of training and, importantly, is a good predictor of individual differences in learning success. The core of dynamically stiff regions exhibits dense connectivity, which is consistent with notions of core-periphery organization established previously in social networks. Our results demonstrate that core-periphery organization provides an insightful way to understand how putative functional modules are linked. This, in turn, enables the prediction of fundamental human capacities, including the production of complex goal-directed behavior., Author Summary When someone learns a new skill, his/her brain dynamically alters individual synapses, regional activity, and larger-scale circuits. In this paper, we capture some of these dynamics by measuring and characterizing patterns of coherent brain activity during the learning of a motor skill. We extract time-evolving communities from these patterns and find that a temporal core that is composed primarily of primary sensorimotor and visual regions reconfigures little over time, whereas a periphery that is composed primarily of multimodal association regions reconfigures frequently. The core consists of densely connected nodes, and the periphery consists of sparsely connected nodes. Individual participants with a larger separation between core and periphery learn better in subsequent training sessions than individuals with a smaller separation. Conceptually, core-periphery organization provides a framework in which to understand how putative functional modules are linked. This, in turn, enables the prediction of fundamental human capacities, including the production of complex goal-directed behavior.
- Published
- 2013
29. Introduction to translational bioinformatics collection
- Author
-
Russ B. Altman
- Subjects
Translational bioinformatics ,Imaging informatics ,Ecology ,Computer science ,business.industry ,Translational medicine ,Translational research ,Computational biology ,Medical research ,Health informatics ,Education ,World Wide Web ,Cellular and Molecular Neuroscience ,Computational Theory and Mathematics ,lcsh:Biology (General) ,Modeling and Simulation ,Informatics ,Computer Science ,Genetics ,Medicine ,Translational research informatics ,business ,Molecular Biology ,lcsh:QH301-705.5 ,Ecology, Evolution, Behavior and Systematics - Abstract
How should we define translational bioinformatics? I had to answer this question unambiguously in March 2008 when I was asked to deliver a review of ‘‘recent progress in translational bioinformatics’’ at the American Medical Informatics Association’s Summit on Translational Bioinformatics. The lecture required me to define papers in the field, and then highlight exciting progress that occurred over the previous ,12 months. I have repeated this for the last few years, and the most difficult part of the exercise is limiting my review only to those papers that are within the field. I have never worried much about definitions within informatics fields; they tend to overlap, merge and evolve. ‘‘Informatics’’ seems clear: the study of how to represent, store, search, retrieve and analyze information. The adjectives in front of ‘‘informatics’’ vary but also tend to make sense: medical informatics concerns medical information, bioinformatics concerns basic biological information, clinical informatics focuses on the clinical delivery part of medical informatics, biomedical informatics merges bioinformatics and medical informatics, imaging informatics focuses on...images, and so on. So what does this adjective ‘‘translational’’ denote? Translational medical research has emerged as an important theme in the last decade. Starting with top-down leadership from the National Institutes of Health and its former Director, Dr. Elias Zerhouni, and moving through academic medical centers, research institutes and industrial research and development efforts, there has been interest in more effectively moving the discoveries and innovations in the laboratory to the bedside, leading to improved diagnosis, prognosis, and treatment. Translational research encompasses many activities including the creation of medical devices, molecular diagnostics, small molecule therapeutics, biological therapeutics, vaccines, and others. One of the main targets of translation, however, is revolutionary explosion of knowledge in molecular biology, genetics, and genomics. Some believe that the tremendous progress in discovery over the last 50+ years since elucidation of the double helix structure has not translated (there’s that word!) into much practical health benefit. While the accuracy of this claim can be debated, there can be no debate that our ability to measure (1) DNA sequence (including entire genomes!), (2) RNA sequence and expression, (3) protein sequence, structure, expression and modification, and (4) small molecule metabolite structure, presence, and quantity has advanced rapidly and enables us to imagine fantastic new technologies in pursuit of human health. There are many barriers to translating our molecular understanding into technologies that impact patients. These include understanding health market size and forces, the regulatory milieu, how to harden the technology for routine use, and how to navigate an increasingly complex intellectual property landscape. But before those activities can begin, we must overcome an even more fundamental barrier: connecting the stuff of molecular biology to the clinical world. Molecular and cellular biology studies genes, DNA, RNA messengers, microRNAs, proteins, signaling molecules and their cascades, metabolites, cellular communication processes and cellular organization. These data are freely available in valuable resources such as Genbank (http://www. ncbi.nlm.nih.gov/genbank/), Gene Expression Omnibus (http://www.ncbi.nlm. nih.gov/geo/), Protein Data Bank (http:// www.wwpdb.org/), KEGG (http://www. genome.jp/kegg/), MetaCyc (http:// metacyc.org/), Reactome (http://www. reactome.org), and many other resources. The clinical world studies diseases, signs, symptoms, drugs, patients, clinical laboratory measurements, and clinical images. The emergence of clinical and health information technologies has begun to make these clinical data available for research through biobanks, electronic medical records, FDA resources about drug labels and adverse events, and claims data. Therefore, a major challenge for translational medicine is to connect the molecular/cellular world with the clinical world. The published literature, available in PubMED (http://www.ncbi.nlm.nih. gov/pubmed), does this, as does the Unified Medical Language System (UMLS) that provides a lingua franca (http://www.nlm.nih.gov/research/umls/ ). However, it falls to translational bioinformatics to engineer the tools that link molecular/cellular entities and clinical entities. Thus, I define ‘‘translational bioinformatics’’ research as the development and application of informatics methods that connect molecular entities to clinical entities. In this collection, Dr. Kann and colleagues have assembled a wonderful group of authors to introduce the key threads of translational bioinformatics to those new to the field. The collection first provides
- Published
- 2012
30. The p7 protein of hepatitis C virus forms structurally plastic, minimalist ion channels
- Author
-
Danielle E. Chandler, Klaus Schulten, François Penin, Christophe Chipot, Institut de biologie et chimie des protéines [Lyon] (IBCP), Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Centre National de la Recherche Scientifique (CNRS), Structure et Réactivité des Systèmes Moléculaires Complexes (SRSMC), and Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut de Chimie du CNRS (INC)
- Subjects
Future studies ,Protein Conformation ,[SDV]Life Sciences [q-bio] ,medicine.disease_cause ,Ion Channels ,Molecular dynamics ,0302 clinical medicine ,Protein structure ,MESH: Protein Conformation ,Computational Chemistry ,MESH: Molecular Dynamics Simulation ,lcsh:QH301-705.5 ,0303 health sciences ,Ecology ,Chemistry ,MESH: Models, Chemical ,3. Good health ,Infectious Diseases ,Computational Theory and Mathematics ,Biochemistry ,Modeling and Simulation ,Medicine ,030211 gastroenterology & hepatology ,Dimerization ,Porosity ,Research Article ,Hepatitis C virus ,Biophysics ,Molecular Dynamics Simulation ,Ion ,03 medical and health sciences ,Cellular and Molecular Neuroscience ,Viral Proteins ,MESH: Porosity ,MESH: Computer Simulation ,Elastic Modulus ,Genetics ,medicine ,Computer Simulation ,Molecular Biology ,Biology ,Ecology, Evolution, Behavior and Systematics ,Ion channel ,030304 developmental biology ,Computational Biology ,MESH: Viral Proteins ,Ion channel activity ,MESH: Dimerization ,Models, Chemical ,lcsh:Biology (General) ,MESH: Ion Channels ,MESH: Elastic Modulus ,Function (biology) - Abstract
Hepatitis C virus (HCV) p7 is a membrane-associated oligomeric protein harboring ion channel activity. It is essential for effective assembly and release of infectious HCV particles and an attractive target for antiviral intervention. Yet, the self-assembly and molecular mechanism of p7 ion channelling are currently only partially understood. Using molecular dynamics simulations (aggregate time 1.2 µs), we show that p7 can form stable oligomers of four to seven subunits, with a bias towards six or seven subunits, and suggest that p7 self-assembles in a sequential manner, with tetrameric and pentameric complexes forming as intermediate states leading to the final hexameric or heptameric assembly. We describe a model of a hexameric p7 complex, which forms a transiently-open channel capable of conducting ions in simulation. We investigate the ability of the hexameric model to flexibly rearrange to adapt to the local lipid environment, and demonstrate how this model can be reconciled with low-resolution electron microscopy data. In the light of these results, a view of p7 oligomerization is proposed, wherein hexameric and heptameric complexes may coexist, forming minimalist, yet robust functional ion channels. In the absence of a high-resolution p7 structure, the models presented in this paper can prove valuable as a substitute structure in future studies of p7 function, or in the search for p7-inhibiting drugs., Author Summary Hepatitis C remains a serious global health problem affecting more than 2% of the world's population, and current therapies are effective in only a subset of patients, necessitating an ongoing search for new treatments. The p7 viroporin is considered to be an attractive possible drug target, but rational drug design is hampered by the absence of a high-resolution p7 structure. In this paper, we explore possible structures of oligomeric p7 channels, and discuss the strengths and shortcomings of these models with respect to experimentally determined properties, such as pore-lining residues, ion conductance, and compatibility with low-resolution electron microscopy images. Our results present an image of p7 as a rudimentary, minimalistic ion channel, capable of existing in multiple oligomeric states but exhibiting a bias towards hexamers and heptamers. We believe that the work presented here will be valuable for future research by providing plausible 3-dimensional atomic-resolution models for the visualization of the p7 viroporin and serve as a basis for future computational studies.
- Published
- 2012
- Full Text
- View/download PDF
31. Dominant glint based prey localization in horseshoe bats: a possible strategy for noise rejection
- Author
-
Herbert Peremans, Dieter Vanderelst, Uwe Firzlaff, and Jonas Reijniers
- Subjects
Entropy ,Acoustics ,Speech recognition ,Human echolocation ,Information loss ,Biology ,Models, Biological ,Sonar ,Behavioral Ecology ,Cellular and Molecular Neuroscience ,Chiroptera ,Genetics ,Animals ,Computer Simulation ,Ear, External ,Theoretical Biology ,Molecular Biology ,lcsh:QH301-705.5 ,Ecology, Evolution, Behavior and Systematics ,Computational Neuroscience ,Computer. Automation ,Evolutionary Biology ,Animal Behavior ,Ecology ,Spatial filter ,Physics ,Computational Biology ,Sensory Systems ,Chemistry ,Amplitude ,Computational Theory and Mathematics ,lcsh:Biology (General) ,Evolutionary Ecology ,Echolocation ,Modeling and Simulation ,Sensory Perception ,Noise ,Mathematics ,Research Article ,Neuroscience - Abstract
Rhinolophidae or Horseshoe bats emit long and narrowband calls. Fluttering insect prey generates echoes in which amplitude and frequency shifts are present, i.e. glints. These glints are reliable cues about the presence of prey and also encode certain properties of the prey. In this paper, we propose that these glints, i.e. the dominant glints, are also reliable signals upon which to base prey localization. In contrast to the spectral cues used by many other bats, the localization cues in Rhinolophidae are most likely provided by self-induced amplitude modulations generated by pinnae movement. Amplitude variations in the echo not introduced by the moving pinnae can be considered as noise interfering with the localization process. The amplitude of the dominant glints is very stable. Therefore, these parts of the echoes contain very little noise. However, using only the dominant glints potentially comes at a cost. Depending on the flutter rate of the insect, a limited number of dominant glints will be present in each echo giving the bat a limited number of sample points on which to base localization. We evaluate the feasibility of a strategy under which Rhinolophidae use only dominant glints. We use a computational model of the echolocation task faced by Rhinolophidae. Our model includes the spatial filtering of the echoes by the morphology of the sonar apparatus of Rhinolophus rouxii as well as the amplitude modulations introduced by pinnae movements. Using this model, we evaluate whether the dominant glints provide Rhinolophidae with enough information to perform localization. Our simulations show that Rhinolophidae can use dominant glints in the echoes as carriers for self-induced amplitude modulations serving as localization cues. In particular, it is shown that the reduction in noise achieved by using only the dominant glints outweighs the information loss that occurs by sampling the echo., Author Summary Rhinolophidae are echolocating bats that hunt among vegetation. The foliage returns clutter echoes that potentially mask the echoes of insect prey. However, prey introduces frequency and amplitude shifts, called glints, into the echo to which these bats are highly sensitive. Therefore, these glints are used by Rhinolophidae to detect prey and infer its properties. One of the defining characteristic of consecutive dominant glints is that they have a very stable amplitude. This is, consecutive wing beats of the insect produce dominant glints with more of less the same amplitude. Owing to the strategy Rhinolophidae use to locate prey, the stable amplitude of glints makes these parts of the echoes ideal signals to use to locate prey. In this paper, we demonstrate the feasibility of strategy under which Rhinolophidae use only the dominant glints in the echo for locating prey.
- Published
- 2011
32. A Differentiation-Based Phylogeny of Cancer Subtypes
- Author
-
Robert J. Downey, Markus Riester, Franziska Michor, Samuel Singer, and Camille Stephan-Otto Attolini
- Subjects
Cellular differentiation ,Evolutionary Biology/Bioinformatics ,Oncology/Sarcomas ,Liposarcoma ,Biology ,Bioinformatics ,03 medical and health sciences ,Cellular and Molecular Neuroscience ,0302 clinical medicine ,Breast cancer ,Phylogenetics ,Neoplasms ,Genetics ,medicine ,Cluster Analysis ,Humans ,Oncology/Hematological Malignancies ,Molecular Biology ,lcsh:QH301-705.5 ,Ecology, Evolution, Behavior and Systematics ,Phylogeny ,030304 developmental biology ,0303 health sciences ,Evolutionary Biology ,Analysis of Variance ,Adipogenesis ,Ecology ,Phylogenetic tree ,Gene Expression Profiling ,Cancer ,Computational Biology ,Cell Differentiation ,medicine.disease ,3. Good health ,Gene expression profiling ,Gene Expression Regulation, Neoplastic ,Leukemia ,Computational Theory and Mathematics ,lcsh:Biology (General) ,Oncology ,030220 oncology & carcinogenesis ,Modeling and Simulation ,Oncology/Breast Cancer ,Algorithms ,Research Article - Abstract
Histopathological classification of human tumors relies in part on the degree of differentiation of the tumor sample. To date, there is no objective systematic method to categorize tumor subtypes by maturation. In this paper, we introduce a novel computational algorithm to rank tumor subtypes according to the dissimilarity of their gene expression from that of stem cells and fully differentiated tissue, and thereby construct a phylogenetic tree of cancer. We validate our methodology with expression data of leukemia, breast cancer and liposarcoma subtypes and then apply it to a broader group of sarcomas. This ranking of tumor subtypes resulting from the application of our methodology allows the identification of genes correlated with differentiation and may help to identify novel therapeutic targets. Our algorithm represents the first phylogeny-based tool to analyze the differentiation status of human tumors., Author Summary Gene expression profiling of malignancies is often held to demonstrate genes that are “up-regulated” or “down-regulated”, but the appropriate frame of reference against which observations should be compared has not been determined. Fully differentiated somatic cells arise from stem cells, with changes in gene expression that can be experimentally determined. If cancers arise as the result of an abruption of the differentiation process, then poorly differentiated cancers would have a gene expression more similar to stem cells than to normal differentiated tissue, and well differentiated cancers would have a gene expression more similar to fully differentiated cells than to stem cells. In this paper, we describe a novel computational algorithm that allows orientation of cancer gene expression between the poles of the gene expression of stem cells and of fully differentiated tissue. Our methodology allows the construction of a multi-branched phylogeny of human malignancies and can be used to identify genes related to differentiation as well as novel therapeutic targets.
- Published
- 2010
33. OptForce: an optimization procedure for identifying all genetic manipulations leading to targeted overproductions
- Author
-
Sridhar Ranganathan, Costas D. Maranas, and Patrick F. Suthers
- Subjects
Systems biology ,In silico ,Succinic Acid ,Biology ,Cellular and Molecular Neuroscience ,Computational Biology/Metabolic Networks ,Genetics ,Escherichia coli ,Production (economics) ,Computer Simulation ,Overproduction ,Set (psychology) ,Molecular Biology ,lcsh:QH301-705.5 ,Ecology, Evolution, Behavior and Systematics ,Computational Biology/Systems Biology ,Models, Statistical ,Ecology ,Models, Genetic ,business.industry ,Systems Biology ,Biotechnology ,Identification (information) ,Metabolic Model ,Computational Theory and Mathematics ,Biochemistry/Bioinformatics ,Gene Expression Regulation ,lcsh:Biology (General) ,Modeling and Simulation ,Classification rule ,Biochemical engineering ,Biotechnology/Bioengineering ,business ,Genetic Engineering ,Algorithms ,Metabolic Networks and Pathways ,Research Article - Abstract
Computational procedures for predicting metabolic interventions leading to the overproduction of biochemicals in microbial strains are widely in use. However, these methods rely on surrogate biological objectives (e.g., maximize growth rate or minimize metabolic adjustments) and do not make use of flux measurements often available for the wild-type strain. In this work, we introduce the OptForce procedure that identifies all possible engineering interventions by classifying reactions in the metabolic model depending upon whether their flux values must increase, decrease or become equal to zero to meet a pre-specified overproduction target. We hierarchically apply this classification rule for pairs, triples, quadruples, etc. of reactions. This leads to the identification of a sufficient and non-redundant set of fluxes that must change (i.e., MUST set) to meet a pre-specified overproduction target. Starting with this set we subsequently extract a minimal set of fluxes that must actively be forced through genetic manipulations (i.e., FORCE set) to ensure that all fluxes in the network are consistent with the overproduction objective. We demonstrate our OptForce framework for succinate production in Escherichia coli using the most recent in silico E. coli model, iAF1260. The method not only recapitulates existing engineering strategies but also reveals non-intuitive ones that boost succinate production by performing coordinated changes on pathways distant from the last steps of succinate synthesis., Author Summary Over the past few years, there has been an unprecedented increase in the use of microorganisms for the production of biofuels, industrial chemicals and pharmaceutical precursors. In this regard, biotechnologists are confronted with the challenge to efficiently convert biomass and other renewable resources into useful biochemicals. With the advent of organism-specific mathematical models of metabolism, scientists have used computations to identify genetic modifications that maximize the yield of a desired product. In this paper, we introduce OptForce, an algorithm that identifies all possible metabolic interventions that lead to the overproduction of a biochemical of interest. Unlike existing techniques, OptForce does not rely on the maximization of a fitness function to predict metabolic fluxes. Instead, OptForce contrasts the metabolic flux patterns observed in an initial strain and a strain overproducing the chemical at the target yield. The essence of this procedure is the identification of all coordinated reaction modifications that force the network towards the overproduction target. We used OptForce to predict metabolic interventions for succinate overproduction in Escherichia coli. The results described in this paper not only uncover existing strain designs for succinate production but also elucidate new ones that can be experimentally explored.
- Published
- 2010
34. Alu and b1 repeats have been selectively retained in the upstream and intronic regions of genes of specific functional classes
- Author
-
Aristotelis Tsirigos and Isidore Rigoutsos
- Subjects
Genome evolution ,Pan troglodytes ,Alu element ,Computational Biology/Comparative Sequence Analysis ,Biology ,Genome ,Cellular and Molecular Neuroscience ,Negative selection ,Mice ,Alu Elements ,Genetics ,Animals ,Humans ,Signal recognition particle RNA ,Selection, Genetic ,Evolutionary Biology/Genomics ,Molecular Biology ,lcsh:QH301-705.5 ,Ecology, Evolution, Behavior and Systematics ,Repetitive Sequences, Nucleic Acid ,Ecology ,Models, Genetic ,Intron ,Chromosome Mapping ,Computational Biology ,Introns ,Rats ,Computational Theory and Mathematics ,Genes ,lcsh:Biology (General) ,Evolutionary biology ,Modeling and Simulation ,Computational Biology/Sequence Motif Analysis ,Mobile genetic elements ,Functional genomics ,Research Article - Abstract
Alu and B1 repeats are mobile elements that originated in an initial duplication of the 7SL RNA gene prior to the primate-rodent split about 80 million years ago and currently account for a substantial fraction of the human and mouse genome, respectively. Following the primate-rodent split, Alu and B1 elements spread independently in each of the two genomes in a seemingly random manner, and, according to the prevailing hypothesis, negative selection shaped their final distribution in each genome by forcing the selective loss of certain Alu and B1 copies. In this paper, contrary to the prevailing hypothesis, we present evidence that Alu and B1 elements have been selectively retained in the upstream and intronic regions of genes belonging to specific functional classes. At the same time, we found no evidence for selective loss of these elements in any functional class. A subset of the functional links we discovered corresponds to functions where Alu involvement has actually been experimentally validated, whereas the majority of the functional links we report are novel. Finally, the unexpected finding that Alu and B1 elements show similar biases in their distribution across functional classes, despite having spread independently in their respective genomes, further supports our claim that the extant instances of Alu and B1 elements are the result of positive selection., Author Summary Despite their fundamental role in cell regulation, genes account for less than 1% of the human genome. Recent studies have shown that non-genic regions of our DNA may also play an important functional role in human cells. In this paper, we study Alu and B elements, a specific class of such non-genic elements that account for ∼10% of the human genome and ∼7% of the mouse genome respectively. We show that, contrary to the prevailing hypothesis, Alu and B elements have been preferentially retained in the proximity of genes that perform specific functions in the cell. In contrast, we found no evidence for selective loss of these elements in any functional class. Several of the functional classes that we have linked to Alu and B elements are central to the proper working of the cell, and their disruption has previously been shown to lead to the onset of disease. Interestingly, the DNA sequences of Alu and B elements differ substantially between human and mouse, thus hinting at the existence of a potentially large number of non-conserved regulatory elements.
- Published
- 2009
35. Predicting positive p53 cancer rescue regions using Most Informative Positive (MIP) active learning
- Author
-
Linda V. Hall, Peter K. Kaiser, Lydia Ho, Kirsty Anne Lily Salmon, Richard H. Lathrop, Roberta Baronio, Samuel A. Danziger, and G. Wesley Hatfield
- Subjects
Models, Molecular ,In silico ,Mutant ,DNA Mutational Analysis ,Biology ,Protein Engineering ,law.invention ,Fungal Proteins ,03 medical and health sciences ,Cellular and Molecular Neuroscience ,0302 clinical medicine ,Text mining ,In vivo ,law ,Artificial Intelligence ,Yeasts ,Genetics ,Tumor regression ,Humans ,Computer Simulation ,Molecular Biology ,Gene ,lcsh:QH301-705.5 ,Ecology, Evolution, Behavior and Systematics ,Computational Biology/Synthetic Biology ,030304 developmental biology ,0303 health sciences ,Ecology ,Models, Genetic ,business.industry ,Biochemistry/Structural Genomics ,Computational Biology ,Reproducibility of Results ,Protein engineering ,Biochemistry/Molecular Evolution ,Computational Theory and Mathematics ,Oncology ,lcsh:Biology (General) ,030220 oncology & carcinogenesis ,Modeling and Simulation ,Mutation ,Suppressor ,Biotechnology/Bioengineering ,Tumor Suppressor Protein p53 ,business ,Algorithms ,Research Article - Abstract
Many protein engineering problems involve finding mutations that produce proteins with a particular function. Computational active learning is an attractive approach to discover desired biological activities. Traditional active learning techniques have been optimized to iteratively improve classifier accuracy, not to quickly discover biologically significant results. We report here a novel active learning technique, Most Informative Positive (MIP), which is tailored to biological problems because it seeks novel and informative positive results. MIP active learning differs from traditional active learning methods in two ways: (1) it preferentially seeks Positive (functionally active) examples; and (2) it may be effectively extended to select gene regions suitable for high throughput combinatorial mutagenesis. We applied MIP to discover mutations in the tumor suppressor protein p53 that reactivate mutated p53 found in human cancers. This is an important biomedical goal because p53 mutants have been implicated in half of all human cancers, and restoring active p53 in tumors leads to tumor regression. MIP found Positive (cancer rescue) p53 mutants in silico using 33% fewer experiments than traditional non-MIP active learning, with only a minor decrease in classifier accuracy. Applying MIP to in vivo experimentation yielded immediate Positive results. Ten different p53 mutations found in human cancers were paired in silico with all possible single amino acid rescue mutations, from which MIP was used to select a Positive Region predicted to be enriched for p53 cancer rescue mutants. In vivo assays showed that the predicted Positive Region: (1) had significantly more (p, Author Summary Engineering proteins to acquire or enhance a particular useful function is at the core of many biomedical problems. This paper presents Most Informative Positive (MIP) active learning, a novel integrated computational/biological approach designed to help guide biological discovery of novel and informative positive mutants. A classifier, together with modeled structure-based features, helps guide biological experiments and so accelerates protein engineering studies. MIP reduces the number of expensive biological experiments needed to achieve novel and informative positive results. We used the MIP method to discover novel p53 cancer rescue mutants. p53 is a tumor suppressor protein, and destructive p53 mutations have been implicated in half of all human cancers. Second-site cancer rescue mutations restore p53 activity and eventually may facilitate rational design of better cancer drugs. This paper shows that, even in the first round of in vivo experiments, MIP significantly increased the discovery rate of novel and informative positive mutants.
- Published
- 2009
36. Qualia: the geometry of integrated information
- Author
-
Giulio Tononi and David Balduzzi
- Subjects
Theoretical computer science ,Consciousness ,Databases, Factual ,Computer science ,media_common.quotation_subject ,Information Theory ,Computational Biology/Computational Neuroscience ,Qualia ,Information theory ,Cellular and Molecular Neuroscience ,Artificial Intelligence ,Perception ,Genetics ,Entropy (information theory) ,Quality of experience ,Neuroscience/Theoretical Neuroscience ,Molecular Biology ,lcsh:QH301-705.5 ,Ecology, Evolution, Behavior and Systematics ,Subdivision ,media_common ,Neuroscience/Cognitive Neuroscience ,Ecology ,business.industry ,Integrated information theory ,Computational Biology ,Models, Theoretical ,Computational Theory and Mathematics ,lcsh:Biology (General) ,Modeling and Simulation ,Artificial intelligence ,business ,Algorithms ,Mathematics ,Research Article - Abstract
According to the integrated information theory, the quantity of consciousness is the amount of integrated information generated by a complex of elements, and the quality of experience is specified by the informational relationships it generates. This paper outlines a framework for characterizing the informational relationships generated by such systems. Qualia space (Q) is a space having an axis for each possible state (activity pattern) of a complex. Within Q, each submechanism specifies a point corresponding to a repertoire of system states. Arrows between repertoires in Q define informational relationships. Together, these arrows specify a quale—a shape that completely and univocally characterizes the quality of a conscious experience. Φ— the height of this shape—is the quantity of consciousness associated with the experience. Entanglement measures how irreducible informational relationships are to their component relationships, specifying concepts and modes. Several corollaries follow from these premises. The quale is determined by both the mechanism and state of the system. Thus, two different systems having identical activity patterns may generate different qualia. Conversely, the same quale may be generated by two systems that differ in both activity and connectivity. Both active and inactive elements specify a quale, but elements that are inactivated do not. Also, the activation of an element affects experience by changing the shape of the quale. The subdivision of experience into modalities and submodalities corresponds to subshapes in Q. In principle, different aspects of experience may be classified as different shapes in Q, and the similarity between experiences reduces to similarities between shapes. Finally, specific qualities, such as the “redness” of red, while generated by a local mechanism, cannot be reduced to it, but require considering the entire quale. Ultimately, the present framework may offer a principled way for translating qualitative properties of experience into mathematics., Author Summary In prior work, we suggested that consciousness has to do with integrated information, which was defined as the amount of information generated by a system in a given state, above and beyond the information generated independently by its parts. In the present paper, we move from computing the quantity of integrated information to describing the structure or quality of the integrated information unfolded by interactions in the system. We take a geometric approach, introducing the notion of a quale as a shape that embodies the entire set of informational relationships generated by interactions in the system. The paper investigates how features of the quale relate to properties of the underlying system and also to basic features of experience, providing the beginnings of a mathematical dictionary relating neurophysiology to the geometry of the quale and the geometry to phenomenology.
- Published
- 2009
37. Getting Started in Text Mining
- Author
-
Michael Seringhaus, Andrey Rzhetsky, and Mark Gerstein
- Subjects
Phrase ,Computer science ,media_common.quotation_subject ,Information Storage and Retrieval ,Scientific literature ,computer.software_genre ,Message from ISCB ,Business process discovery ,Cellular and Molecular Neuroscience ,Artificial Intelligence ,Reading (process) ,Genetics ,Question answering ,Molecular Biology ,lcsh:QH301-705.5 ,Ecology, Evolution, Behavior and Systematics ,media_common ,Information retrieval ,Ecology ,Computational Biology ,Data science ,Databases, Bibliographic ,Conjunction (grammar) ,Computational Biology/Literature Analysis ,Information extraction ,Computational Theory and Mathematics ,lcsh:Biology (General) ,Modeling and Simulation ,Database Management Systems ,computer ,Sentence - Abstract
We are, in a sense, drowning in information. Today, it is unusual for scientists even to read a journal cover to cover—much less to personally parse all information pertinent to even a narrow research area. Increasingly complex content, large digital supplements, and a staggering volume of publications are now threatening old-fashioned scientific reading with extinction. But by using computers to sift through and scour published articles, the nascent technology of text mining promises to automate the rote information-gathering stage—hopefully leaving to human minds the more challenging (and rewarding) activity of higher thinking. This article is intended to continue where Cohen and Hunter [1] left off in “Getting Started in Text Mining,” an introduction in the January 2008 issue of PLoS Computational Biology which covered the actual mining of text and its digestion into small quanta of computer-manageable information (http://www.ploscompbiol.org/doi/pcbi.0040020). In this overview of the field, we begin by summarizing the major stages of current text-processing pipelines. We now focus on the downstream questions scientists can ask using text-mining and literature-mining engines. At times, we (deliberately) blur the boundary between today's approaches and tomorrow's possibilities. Figure 1 shows a high-level overview of the stages in text mining, with a focus on its applications. We begin at the top left of the figure, which shows the process of information retrieval—how we select relevant documents [2]. Unfortunately, free full-text access remains impossible for a large portion of scientific journals. In some fields, such as chemistry, even article abstracts are inaccessible for a large-scale analysis. The obvious outcome is that articles published in open-access journals have a better chance of being identified as relevant hits than others appearing in traditional “closed-access” journals. Electronic access to text obviously impacts all stages of text mining. Figure 1 Major techniques and applications of text mining. Once the documents have been chosen by an information retrieval engine, a computer scans the text and picks out the various entities (objects, concepts, and symbols) in each sentence. This process, called named-entity recognition [3], draws upon dictionaries of synonyms and homonyms, in addition to machine-learning tools [4], so that an individual entity (say, a protein) is recognized consistently—even though it may be referred to by several different names and acronyms [5]. Named-entity recognition is closely related to the design of controlled terminologies [6] and ontologies for the annotation of texts and experimental data [7]—a process often requiring a monumental community effort [8]. The next step is information extraction (IE) (see pp. 545–559 in [9]). Here, entities are assembled into simple phrases and clauses that capture the meaning of the mined text. To accomplish this, two or more entities are juxtaposed, and meaningful action words—called predicates—are chosen to link the entities. For instance, we might say gene X genetically interacts with gene Y, or protein A binds to protein B. Each completed clause describes a basic relationship between entities. The question then becomes, what can we do with all these simple or complex clauses? The answer is, quite a lot—which helps explain why text mining is poised to become a powerful central pillar in scientific research and recordkeeping. The lower two-thirds of Figure 1 illustrates how the results of information extraction (IE) can be synthesized and used. Because IE yields a collection of phrases linking entities through predicates, one of its simplest but valuable uses is to answer simple questions posed to an automated system [10]. In this approach, human questions are digested by a linguistic engine (likely using the same process as employed on original mined text) and mapped to simple phrases. These question phrases are then queried against the database of phrases already stored in the computer, which were generated through the application of IE to analyzed text. (Another mode of question answering, bypassing generation and querying of a database entirely, involves direct search and analysis of relevant texts. These texts can be stored at a local computer disk or distributed on numerous computers around the world.) Figure 1 outlines the basic process by which the machine interprets the question, queries its database of stored relationships, and returns an answer. IE-generated knowledge often tracks closely the needs of experimental biologists. Typical IE systems are developed in direct response to acute practical problems, such as large-scale annotation of regulatory regions in genomes [11], collecting published claims about experimental evidence supporting a collection of assertions [12], and condensing sparse information about phenotypic effects of mutations in proteins [13]. Of course, IE-generated databases can be supplemented with additional data gleaned from experiment, or contributed through other non–text-mining means. A simple user interface could facilitate contributing raw experimental data or other information into the database of relationships expressed as simple phrases—again, entities linked by actions (see, for example, the REFLECT system, http://reflect.ws/). Adding more such data should correspondingly increase the effectiveness of the computer's answers to user questions. Another major use for the database of IE-generated phrases is to employ the collection itself for the discovery of new information [14],[15]. One approach to this is to seek out “idea isomorphisms”, by which we mean identifying similar types of logical constructs across different contexts. Finding that similar small ideas (or phrases) occur in different fields might allow researchers to bridge different areas of inquiry. Such bridging of fields, in turn, might uncover new connections, thereby suggesting new and unexpected hypotheses that can then be tested experimentally. The collection of phrases can also be used to vet and prune itself by examining the consistency among many entries. For instance, conflicting or erroneous data can be flagged. By examining each record situated within a large number of records, the preponderance of evidence could assist in identifying and resolving errors. Say, for example, that 20 distinct phrases all indicate that protein A interacts with protein B, and one phrase suggests otherwise; we might probabilistically argue, then, that the lone conflicting statement is false and should be disregarded—unless it is supported some other way. An additional approach to using these phrases—in a mega-scale fashion—is to construct a “map of science”, a global description of the interrelationships between different fields of inquiry. This is similar conceptually to PubNet [16], which highlights connections between authors. However, the map of science would be generated not through coauthor relationships but through clustering the underlying scientific fact claims themselves, as represented in the IE phrase collection. To do this, researchers would cluster papers according to their IE-derived phrase content; any two papers can be compared in this way to derive a measure of their similarity and overlap in terms of information content. By repeating this process, researchers could create a distance map of all papers in science, and, along the way, of all the factoids that the information content of the papers themselves comprise. In addition, researchers might track the changing nature of the IE phrases over time to examine the dynamics of scientific belief. This could involve observing as simple phrases themselves change in occurrence or content over time, or we might watch these simple ideas and truth claims crop up in the scientific literature and track their development that way. Finally, the middle right-hand section of Figure 1 depicts a very simple type of analysis involving the IE-generated simple phrase collection. This approach involves simply looking at the phrases' occurrence in the databases, and recording which statements tend to occur more than others. This type of analysis normally generates a kind of power law–type structure, where it becomes apparent that a few phrases occur many times, but most others only occur a few times. Text/literature mining is a powerful approach, one we expect to substantially bolster the scientific reporting and discovery process in coming years. Applying the organizational, storage, and pattern-matching capabilities of modern computers to the vast corpus of scientific information contained in the literature (present, past, and future) will not only transform the vast archives of science into rapid-access searchable computerized data, but no doubt also catalyze the discovery of much new knowledge. We hope that this brief “getting started” report highlights some of the major and promising avenues opening as a result of advances in text mining. Note to the reader: The field of text mining is young and growing rapidly, and our own interests and experiences have in large part shaped our perspective on it. We are constrained by length limits here to (reluctantly) omit several topics, such as text mining in conjunction with image analysis, important community text-annotation efforts, and ontology engineering—each important in its own right. Furthermore, every issue touched upon in this essay comes with a rich diversity of views and approaches in the text-mining community. While we cannot possibly do justice to this complexity, the reader should reject the impression that there is but a single correct way to perform text analysis.
- Published
- 2009
38. Taking the lag out of jet lag through model-based schedule design
- Author
-
Daniel B. Forger, Elizabeth B. Klerman, and Dennis A. Dean
- Subjects
Schedule ,Computer science ,Lag ,Environment ,Models, Biological ,Scheduling (computing) ,03 medical and health sciences ,Cellular and Molecular Neuroscience ,0302 clinical medicine ,Genetics ,Humans ,Computer Simulation ,Circadian rhythm ,Wakefulness ,Molecular Biology ,Key schedule ,lcsh:QH301-705.5 ,Ecology, Evolution, Behavior and Systematics ,Simulation ,Lighting ,030304 developmental biology ,Jet Lag Syndrome ,0303 health sciences ,Chronobiology ,Ecology ,Computational Biology ,Circadian Rhythm ,Computational Theory and Mathematics ,lcsh:Biology (General) ,13. Climate action ,Modeling and Simulation ,Computer Science ,Sleep (system call) ,Sleep ,030217 neurology & neurosurgery ,Mathematics ,Algorithms ,Software ,Research Article - Abstract
Travel across multiple time zones results in desynchronization of environmental time cues and the sleep–wake schedule from their normal phase relationships with the endogenous circadian system. Circadian misalignment can result in poor neurobehavioral performance, decreased sleep efficiency, and inappropriately timed physiological signals including gastrointestinal activity and hormone release. Frequent and repeated transmeridian travel is associated with long-term cognitive deficits, and rodents experimentally exposed to repeated schedule shifts have increased death rates. One approach to reduce the short-term circadian, sleep–wake, and performance problems is to use mathematical models of the circadian pacemaker to design countermeasures that rapidly shift the circadian pacemaker to align with the new schedule. In this paper, the use of mathematical models to design sleep–wake and countermeasure schedules for improved performance is demonstrated. We present an approach to designing interventions that combines an algorithm for optimal placement of countermeasures with a novel mode of schedule representation. With these methods, rapid circadian resynchrony and the resulting improvement in neurobehavioral performance can be quickly achieved even after moderate to large shifts in the sleep–wake schedule. The key schedule design inputs are endogenous circadian period length, desired sleep–wake schedule, length of intervention, background light level, and countermeasure strength. The new schedule representation facilitates schedule design, simulation studies, and experiment design and significantly decreases the amount of time to design an appropriate intervention. The method presented in this paper has direct implications for designing jet lag, shift-work, and non-24-hour schedules, including scheduling for extreme environments, such as in space, undersea, or in polar regions., Author Summary Traveling across several times zones can cause an individual to experience “jet lag,” which includes trouble sleeping at night and trouble remaining awake during the day. A major cause of these effects is the desynchronization between the body's internal circadian clock and local environmental cues. A well-known intervention to resynchronize an individual's clock with the environment is appropriately timed light exposure. Used as an intervention, properly timed light stimuli can reset an individual's internal circadian clock to align with local time, resulting in more efficient sleep, a decrease in fatigue, and an increase in cognitive performance. The contrary is also true: poorly timed light exposure can prolong the resynchronization process. In this paper, we present a computational method for automatically determining the proper placement of these interventional light stimuli. We used this method to simulate shifting sleep–wake schedules (as seen in jet lag situations) and design interventions. Essential to our approach is the use of mathematical models that simulate the body's internal circadian clock and its effect on human performance. Our results include quicker design of multiple schedule alternatives and predictions of substantial performance improvements relative to no intervention. Therefore, our methods allow us to use these models not only to assess schedules but also to interactively design schedules that will result in improved performance.
- Published
- 2009
39. Maximal extraction of biological information from genetic interaction data
- Author
-
David J. Galas, Gregory W. Carter, and Timothy Galitski
- Subjects
Proteome ,Systems biology ,Information Storage and Retrieval ,Biology ,Machine learning ,computer.software_genre ,Set (abstract data type) ,03 medical and health sciences ,Cellular and Molecular Neuroscience ,0302 clinical medicine ,Gene interaction ,Protein Interaction Mapping ,Genetics ,Computer Simulation ,Molecular Biology ,lcsh:QH301-705.5 ,Ecology, Evolution, Behavior and Systematics ,030304 developmental biology ,0303 health sciences ,Computational Biology/Systems Biology ,Ecology ,Models, Genetic ,business.industry ,Contrast (statistics) ,Gene Annotation ,Genetics and Genomics/Bioinformatics ,Genetic architecture ,Order (biology) ,Computational Theory and Mathematics ,Gene Expression Regulation ,lcsh:Biology (General) ,Modeling and Simulation ,Artificial intelligence ,business ,computer ,030217 neurology & neurosurgery ,Biological network ,Algorithms ,Research Article ,Signal Transduction - Abstract
Extraction of all the biological information inherent in large-scale genetic interaction datasets remains a significant challenge for systems biology. The core problem is essentially that of classification of the relationships among phenotypes of mutant strains into biologically informative “rules” of gene interaction. Geneticists have determined such classifications based on insights from biological examples, but it is not clear that there is a systematic, unsupervised way to extract this information. In this paper we describe such a method that depends on maximizing a previously described context-dependent information measure to obtain maximally informative biological networks. We have successfully validated this method on two examples from yeast by demonstrating that more biological information is obtained when analysis is guided by this information measure. The context-dependent information measure is a function only of phenotype data and a set of interaction rules, involving no prior biological knowledge. Analysis of the resulting networks reveals that the most biologically informative networks are those with the greatest context-dependent information scores. We propose that these high-complexity networks reveal genetic architecture at a modular level, in contrast to classical genetic interaction rules that order genes in pathways. We suggest that our analysis represents a powerful, data-driven, and general approach to genetic interaction analysis, with particular potential in the study of mammalian systems in which interactions are complex and gene annotation data are sparse., Author Summary Targeted genetic perturbation is a powerful tool for inferring gene function in model organisms. Functional relationships between genes can be inferred by observing the effects of multiple genetic perturbations in a single strain. The study of these relationships, generally referred to as genetic interactions, is a classic technique for ordering genes in pathways, thereby revealing genetic organization and gene-to-gene information flow. Genetic interaction screens are now being carried out in high-throughput experiments involving tens or hundreds of genes. These data sets have the potential to reveal genetic organization on a large scale, and require computational techniques that best reveal this organization. In this paper, we use a complexity metric based in information theory to determine the maximally informative network given a set of genetic interaction data. We find that networks with high complexity scores yield the most biological information in terms of (i) specific associations between genes and biological functions, and (ii) mapping modules of co-functional genes. This information-based approach is an automated, unsupervised classification of the biological rules underlying observed genetic interactions. It might have particular potential in genetic studies in which interactions are complex and prior gene annotation data are sparse.
- Published
- 2009
40. A Review of 2008 for PLoS Computational Biology
- Author
-
Evie Browne, Philip E. Bourne, and Rosemary Dickin
- Subjects
media_common.quotation_subject ,Appeal ,Scientific literature ,Computational biology ,Organic growth ,Cellular and Molecular Neuroscience ,Political science ,Genetics ,Quality (business) ,Molecular Biology ,Publication ,lcsh:QH301-705.5 ,Ecology, Evolution, Behavior and Systematics ,media_common ,Ecology ,business.industry ,Flexibility (personality) ,Computational Biology ,Time limit ,Social research ,Editorial ,Computational Theory and Mathematics ,lcsh:Biology (General) ,Modeling and Simulation ,Periodicals as Topic ,business ,Editorial Policies - Abstract
A very Happy New Year to all our authors, readers, editors, and reviewers from everyone at the Public Library of Science! 2008 was a remarkable year for PLoS Computational Biology; which saw 50% more submissions than in 2007 (900 full articles and 175 presubmission inquiries), more than 260 high-quality research articles published, and regular contributions of Editorials, Reviews and Perspectives, and Education and Society pages. This growth and maturity of content leaves no doubt that our Journal has become a leading reference for the field of computational biology and a trusted place to publish. Such success has come through the hard work of our Editors, not only from our Editorial Board but also from the anonymous reviewers and Guest Editors who expend so much time and energy in the assessment of submitted manuscripts (each averaging 2.8 reviews and 1 to 2 rounds of revisions), and from the attention to detail and care taken over the content. Peer review by external experts is essential to ensuring that the work published in PLoS Computational Biology is of the very highest quality, and we are grateful to all of our reviewers for their thoughtful and informed comments. Guest Editors are those who step in to edit one particular paper that describes work in an area of research that falls outside the expertise of the more than 50 volunteer Editors on our Board. The flexibility and availability of these Guest Editors is invaluable in our being able to provide a high level of review, as well as playing an important role in maintaining the broad appeal and vibrancy of the Journal. Their names can be found together in Table S1 as an acknowledgment of the good work they do and the time they donate to improve the body of scientific literature and knowledge. In 2008, our pool of reviewers included approximately 1,300 scientists in 36 countries, including Vietnam, Mexico, Brazil, and Afghanistan, as well as in countries such as Israel, Germany, and Japan, where the Journal is better-known. This impressive geographical spread indicates that we are reaching the best of the best across the scientific world, something only a well-respected journal of quality is able to accomplish. Organic growth requires that we constantly assess both the kinds of papers we accept and the standards of research they represent. We have revised our scope statement to reflect slight changes in our focus (see http://www.ploscompbiol.org/static/information.action), and we constantly refine our Editorial Board (http://www.ploscompbiol.org/static/edboard.action) to handle the number and types of papers we are encouraging. Experiencing solid growth can come at a price to the speed of our Editorial processes, however, and while we aim to provide a decision to our authors within 35 days, some papers defy this time limit. We are confident, however, that with your continued help and support, we will reach our targets more consistently this year. As authors, you appreciate a swift response time, and as reviewers you can help us achieve this by making a commitment in 2009 to return reviews within two weeks. Looking ahead in 2009, you can expect to see not only more great research, but also greater connectivity between content found in different PLoS journals and among members of your community. As an example of the former, PLoS Computational Biology will be working with PLoS ONE to feature developments in software important to our discipline. For the latter, the community can read and participate in discussions that start when readers post a comment or rating on a published article. As we have done since our launch, we welcome your feedback on how we're doing and what we should be doing going forward. This is your Journal, and our open philosophy encourages your engagement in it. By working together, we can further establish the importance of our science to our understanding of living systems and make a positive contribution to moving it forward even in these uncertain times. Once again, many thanks to all of you for your support and commitment to making 2008 a successful year for PLoS Computational Biology and to ensuring that we are able to achieve even more in the upcoming year.
- Published
- 2009
41. Sizing Up Allometric Scaling Theory
- Author
-
Van M. Savage, Eric J. Deeds, and Walter Fontana
- Subjects
0106 biological sciences ,Biophysics/Theory and Simulation ,Databases, Factual ,Context (language use) ,010603 evolutionary biology ,01 natural sciences ,Power law ,Models, Biological ,03 medical and health sciences ,Cellular and Molecular Neuroscience ,Statistics ,Genetics ,Range (statistics) ,Canonical model ,Animals ,Body Size ,Computer Simulation ,Limit (mathematics) ,Statistical physics ,Molecular Biology ,Scaling ,lcsh:QH301-705.5 ,Ecology, Evolution, Behavior and Systematics ,030304 developmental biology ,Mathematics ,Mammals ,0303 health sciences ,Computational Biology/Systems Biology ,Ecology ,Physiology/Cardiovascular Physiology and Circulation ,Computational Biology ,Function (mathematics) ,Physics/General Physics ,Capillaries ,Oxygen ,Metabolism ,Computational Theory and Mathematics ,lcsh:Biology (General) ,Modeling and Simulation ,Exponent ,Mathematics/Statistics ,Blood Flow Velocity ,Research Article - Abstract
Metabolic rate, heart rate, lifespan, and many other physiological properties vary with body mass in systematic and interrelated ways. Present empirical data suggest that these scaling relationships take the form of power laws with exponents that are simple multiples of one quarter. A compelling explanation of this observation was put forward a decade ago by West, Brown, and Enquist (WBE). Their framework elucidates the link between metabolic rate and body mass by focusing on the dynamics and structure of resource distribution networks—the cardiovascular system in the case of mammals. Within this framework the WBE model is based on eight assumptions from which it derives the well-known observed scaling exponent of 3/4. In this paper we clarify that this result only holds in the limit of infinite network size (body mass) and that the actual exponent predicted by the model depends on the sizes of the organisms being studied. Failure to clarify and to explore the nature of this approximation has led to debates about the WBE model that were at cross purposes. We compute analytical expressions for the finite-size corrections to the 3/4 exponent, resulting in a spectrum of scaling exponents as a function of absolute network size. When accounting for these corrections over a size range spanning the eight orders of magnitude observed in mammals, the WBE model predicts a scaling exponent of 0.81, seemingly at odds with data. We then proceed to study the sensitivity of the scaling exponent with respect to variations in several assumptions that underlie the WBE model, always in the context of finite-size corrections. Here too, the trends we derive from the model seem at odds with trends detectable in empirical data. Our work illustrates the utility of the WBE framework in reasoning about allometric scaling, while at the same time suggesting that the current canonical model may need amendments to bring its predictions fully in line with available datasets., Author Summary The rate at which an organism produces energy to live increases with body mass to the 3/4 power. Ten years ago West, Brown, and Enquist posited that this empirical relationship arises from the structure and dynamics of resource distribution networks such as the cardiovascular system. Using assumptions that capture physical and biological constraints, they defined a vascular network model that predicts a 3/4 scaling exponent. In our paper we clarify that this model generates the 3/4 exponent only in the limit of infinitely large organisms. Our calculations indicate that in the finite-size version of the model metabolic rate and body mass are not related by a pure power law, which we show is consistent with available data. We also show that this causes the model to produce scaling exponents significantly larger than the observed 3/4. We investigate how changes in certain assumptions about network structure affect the scaling exponent, leading us to identify discrepancies between available data and the predictions of the finite-size model. This suggests that the model, the data, or both, need reassessment. The challenge lies in pinpointing the physiological and evolutionary factors that constrain the shape of networks driving metabolic scaling.
- Published
- 2008
42. The long and thorny road to publication in quality journals
- Author
-
Thomas C. Erren
- Subjects
Ecology ,Point (typography) ,Operations research ,Computer science ,business.industry ,Interpretation (philosophy) ,media_common.quotation_subject ,Cellular and Molecular Neuroscience ,Computational Theory and Mathematics ,lcsh:Biology (General) ,Publishing ,Modeling and Simulation ,Law ,Sympathy ,Genetics ,Consolation ,Quality (business) ,Empirical evidence ,business ,Molecular Biology ,Publication ,lcsh:QH301-705.5 ,Ecology, Evolution, Behavior and Systematics ,media_common - Abstract
Within the “Ten Simple Rules” series in PLoS Computational Biology, Dr. Bourne suggests that for younger investigators it is better to publish one paper in a quality journal rather than having multiple papers in lesser journals [1]. While this is certainly advisable, it can be very difficult. Indeed, for young scientists or, more to the point, for researchers with a short record of publications, it may be almost impossible to make their work and themselves visible to a larger scientific community via higher impact journals. A not-too-small share of “seasoned” scientists will argue without malignity that “we experienced similar or the same” and “good researchers will eventually be recognized.” What they imply is that those who continue to provide good science shall be rewarded later, i.e., their papers will eventually find a home in quality journals, thus yielding better chances that the work will have impact. And yet, a much-cited case study ([2]; cited 264 times as of November 18, 2007, according to http://isiwebofknowledge.com/) may illustrate that the road to publication and recognition can be thorny and long for younger and less-recognized scientists. Indeed, this “experiment” by Peters and Ceci provided empirical evidence 25 years ago that to get a paper accepted for publication can be very difficult for lesser-known scientists from less-recognized institutions. In this study, 12 psychology articles that had already been published by prestigious scientists from prestigious institutions were resubmitted to the journals that had accepted and printed the papers in the first place. Data presentation remained almost unaltered, but fictitious names and not-well-known institutions replaced the original ones. Only three of the resubmissions were identified as such, and of the other nine manuscripts, eight were rejected, mainly for methodological reasons. The Peters and Ceci study was widely discussed, and one interpretation for their observations was that work from lesser-known researchers may be subjected to a more critical peer review than material submitted by well-known investigators in institutions with a long track record. To exemplify this notion, 1977 Nobel Laureate Rosalyn Yalow commented on the article by Peters and Ceci “. . . . I am in full sympathy with rejecting papers from unknown authors working in unknown institutions. How does one know that the data are not fabricated? . . . on the average, the work of established investigators in good institutions is more likely to have had prior review from competent peers and associates even before reaching the journal” [3]. Despite this background, Dr. Bourne is right when he suggests that young investigators should aim at publication in quality journals. After all, you can only score high if you try. But be prepared that it takes very good material and perseverance to publish in well-known journals. Be aware, also, that even the highest-quality work may not see publication in high-impact journals, for numerous reasons, with the novice status of the submitting author(s) likely being a primary one. In this vein, both less and more experienced researchers may want to read the following paper for empirical comfort: “Consolation for the scientist: Sometimes it is hard to publish papers that are later highly cited” [4].
- Published
- 2007
43. Are there rearrangement hotspots in the human genome?
- Author
-
Max A. Alekseyev and Pavel A. Pevzner
- Subjects
Genome evolution ,De facto ,Molecular Sequence Data ,Genomics ,Biology ,Genome ,Evolution, Molecular ,Cellular and Molecular Neuroscience ,Homo (Human) ,Genetics ,Humans ,Computer Simulation ,Molecular Biology ,lcsh:QH301-705.5 ,Ecology, Evolution, Behavior and Systematics ,Gene Rearrangement ,Evolutionary Biology ,Base Sequence ,Models, Genetic ,Ecology ,Genome, Human ,Chromosome Mapping ,Genetic Variation ,Computational Biology ,Chromosome ,Gene rearrangement ,Mus (Mouse) ,Gene deletion ,Computational Theory and Mathematics ,lcsh:Biology (General) ,Evolutionary biology ,Modeling and Simulation ,Human genome ,Research Article - Abstract
In a landmark paper, Nadeau and Taylor [18] formulated the random breakage model (RBM) of chromosome evolution that postulates that there are no rearrangement hotspots in the human genome. In the next two decades, numerous studies with progressively increasing levels of resolution made RBM the de facto theory of chromosome evolution. Despite the fact that RBM had prophetic prediction power, it was recently refuted by Pevzner and Tesler [4], who introduced the fragile breakage model (FBM), postulating that the human genome is a mosaic of solid regions (with low propensity for rearrangements) and fragile regions (rearrangement hotspots). However, the rebuttal of RBM caused a controversy and led to a split among researchers studying genome evolution. In particular, it remains unclear whether some complex rearrangements (e.g., transpositions) can create an appearance of rearrangement hotspots. We contribute to the ongoing debate by analyzing multi-break rearrangements that break a genome into multiple fragments and further glue them together in a new order. In particular, we demonstrate that (1) even if transpositions were a dominant force in mammalian evolution, the arguments in favor of FBM still stand, and (2) the “gene deletion” argument against FBM is flawed., Author Summary Rearrangements are genomic “earthquakes” that change the chromosomal architectures. The fundamental question in molecular evolution is whether there exist “chromosomal faults” (rearrangement hotspots) where rearrangements are happening over and over again. The random breakage model (RBM) postulates that rearrangements are “random,” and thus there are no rearrangement hotspots in mammalian genomes. RBM was proposed by Susumo Ohno in 1970 and later was formalized by Nadeau and Taylor in 1984. It was embraced by biologists from the very beginning due to its prophetic prediction power, and only in 2003 was refuted by Pevzner and Tesler, who suggested an alternative fragile breakage model (FBM) of chromosome evolution. However, the rebuttal of RBM caused a controversy, and in 2004, Sankoff and Trinh gave a rebuttal of the rebuttal of RBM. This led to a split among researchers studying chromosome evolution: while most recent studies support the existence of rearrangement hotspots, others feel that further analysis is needed to resolve the validity of RBM. In this paper, we develop a theory for analyzing complex rearrangements (including transpositions) and demonstrate that even if transpositions were a dominant evolutionary force, there are still rearrangement hotspots in mammalian genomes.
- Published
- 2007
44. 'Simple Rules for Editors'? Here Is One Rule to Tackle Neglected Problems of Publishing: Response from PLoS
- Author
-
Mark Patterson and Barbara Cohen
- Subjects
Ecology ,Competing interests ,Point (typography) ,Inclusion (disability rights) ,business.industry ,Statement (logic) ,Cellular and Molecular Neuroscience ,lcsh:Biology (General) ,Computational Theory and Mathematics ,Publishing ,Modeling and Simulation ,Political science ,Correspondence ,Genetics ,Publication ethics ,Engineering ethics ,Medical journal ,business ,lcsh:QH301-705.5 ,Molecular Biology ,Ecology, Evolution, Behavior and Systematics ,Simple (philosophy) - Abstract
We are grateful to the Drs. Erren for highlighting the concern that younger and less-experienced scientists are sometimes not given due credit for their contributions, in particular through omission from authorship on published articles. The Drs. Erren correctly point out that many journals including those published by PLoS do ask for all author contributions to be described on submission, and these contributions are also included in the published article. The corresponding author has ultimate responsibility for ensuring that these contributions are correct. Authorship issues are a topic of considerable debate among editorial groups such as the International Committee of Medical Journal Editors (ICMJE) and the Committee on Publication Ethics (COPE). PLoS Medicine, for example, requires that each author on a paper respond to a specific e-mail to independently confirm their contribution to the work and whether they have any competing interests. No final decision is made on the paper until all authors have responded to that e-mail. This policy aims to ensure that all listed authors are able to justify their inclusion and to doublecheck for competing interests. Although it is not the role of editors to arbitrate authorship, on two occasions authors have agreed after receiving this e-mail request that they do not fulfil the criteria for authorship and have requested that they simply be acknowledged within the paper instead. In addition, PLoS Medicine specifically reminds authors on submission that all medical writers must be included on the paper with their contributions. PLoS is committed to ensuring that the byline on papers is correct and that the contribution statement accurately describes the contributions of authors. One way to reinforce to authors the importance of accurate authorship details is to strengthen our statement on submission forms and on author instructions for all journals, and to remind corresponding authors that they are ultimately responsible for confirming that no additional authors should be listed and that author contributions are accurate. It will also be interesting to see whether issues relating to authorship will be addressed with the commentary and annotation features that are available in the new journal PLoS ONE, and which will be applied to other PLoS journals in due course.
- Published
- 2007
45. New maximum likelihood estimators for eukaryotic intron evolution
- Author
-
Maki Yoshihama, Hung D. Nguyen, and Naoya Kenmochi
- Subjects
Yeast and Fungi ,Maximum Likelihood ,Lineage (genetic) ,Evolution ,Eukaryotes ,Base pair ,Maximum likelihood ,Biology ,Cellular and Molecular Neuroscience ,Correspondence ,Genetics ,Animals ,Molecular Biology ,lcsh:QH301-705.5 ,Ecology, Evolution, Behavior and Systematics ,Ecology ,Human evolutionary genetics ,Intron ,Estimator ,Plants ,Intron Evolution ,Computational Theory and Mathematics ,Target site ,lcsh:Biology (General) ,Evolutionary biology ,Modeling and Simulation ,Parallel evolution ,Bioinformatics - Computational Biology ,Research Article - Abstract
The evolution of spliceosomal introns remains poorly understood. Although many approaches have been used to infer intron evolution from the patterns of intron position conservation, the results to date have been contradictory. In this paper, we address the problem using a novel maximum likelihood method, which allows estimation of the frequency of intron insertion target sites, together with the rates of intron gain and loss. We analyzed the pattern of 10,044 introns (7,221 intron positions) in the conserved regions of 684 sets of orthologs from seven eukaryotes. We determined that there is an average of one target site per 11.86 base pairs (bp) (95% confidence interval, 9.27 to 14.39 bp). In addition, our results showed that: (i) overall intron gains are ~25% greater than intron losses, although specific patterns vary with time and lineage; (ii) parallel gains account for ~18.5% of shared intron positions; and (iii) reacquisition following loss accounts for ~0.5% of all intron positions. Our results should assist in resolving the long-standing problem of inferring the evolution of spliceosomal introns., Synopsis When did spliceosomal introns originate, and what is their role? These questions are the central subject of the introns-early versus introns-late debate. Inference of intron evolution from the pattern of intron position conservation is vital for resolving this debate. So far, different methods of two approaches, maximum parsimony (MP) and maximum likelihood (ML), have been developed, but the results are contradictory. The differences between previous ML results are due predominantly to differing assumptions concerning the frequency of target sites for intron insertion. This paper describes a new ML method that treats this frequency as a parameter requiring optimization. Using the pattern of intron position in conserved regions of 684 clusters of gene orthologs from seven eukaryotes, the authors found that, on average, there is one target site per ~12 base pairs. The results of intron evolution inferred using this optimal frequency are more definitive than previous ML results. Since the ML method is preferred to the MP one for large datasets, the current results should be the most reliable ones to date. The results show that during the course of evolution there have been slightly more intron gains than losses, and thus they favor introns-late. These results should shed new light on our understanding of intron evolution.
- Published
- 2005
46. Allele-specific amplification in cancer revealed by SNP array analysis
- Author
-
William R. Sellers, Cheng Li, Rameen Beroukhim, David P. Harrington, Thomas LaFramboise, Xiaojun Zhao, Barbara A. Weir, and Matthew Meyerson
- Subjects
Gene Dosage ,Copy number analysis ,Single-nucleotide polymorphism ,Biology ,Molecular Inversion Probe ,Polymorphism, Single Nucleotide ,Loss of heterozygosity ,03 medical and health sciences ,Cellular and Molecular Neuroscience ,0302 clinical medicine ,Cell Line, Tumor ,Neoplasms ,Homo (Human) ,Genotype ,Genetics ,Chromosomes, Human ,Humans ,Genotyping ,Molecular Biology ,lcsh:QH301-705.5 ,Alleles ,Ecology, Evolution, Behavior and Systematics ,Oligonucleotide Array Sequence Analysis ,Cancer Biology ,030304 developmental biology ,0303 health sciences ,Models, Genetic ,Ecology ,Statistics ,Haplotype ,Gene Amplification ,DNA, Neoplasm ,3. Good health ,ErbB Receptors ,Gene Expression Regulation, Neoplastic ,Haplotypes ,Computational Theory and Mathematics ,lcsh:Biology (General) ,030220 oncology & carcinogenesis ,Modeling and Simulation ,Bioinformatics - Computational Biology ,Research Article ,SNP array - Abstract
Amplification, deletion, and loss of heterozygosity of genomic DNA are hallmarks of cancer. In recent years a variety of studies have emerged measuring total chromosomal copy number at increasingly high resolution. Similarly, loss-of-heterozygosity events have been finely mapped using high-throughput genotyping technologies. We have developed a probe-level allele-specific quantitation procedure that extracts both copy number and allelotype information from single nucleotide polymorphism (SNP) array data to arrive at allele-specific copy number across the genome. Our approach applies an expectation-maximization algorithm to a model derived from a novel classification of SNP array probes. This method is the first to our knowledge that is able to (a) determine the generalized genotype of aberrant samples at each SNP site (e.g., CCCCT at an amplified site), and (b) infer the copy number of each parental chromosome across the genome. With this method, we are able to determine not just where amplifications and deletions occur, but also the haplotype of the region being amplified or deleted. The merit of our model and general approach is demonstrated by very precise genotyping of normal samples, and our allele-specific copy number inferences are validated using PCR experiments. Applying our method to a collection of lung cancer samples, we are able to conclude that amplification is essentially monoallelic, as would be expected under the mechanisms currently believed responsible for gene amplification. This suggests that a specific parental chromosome may be targeted for amplification, whether because of germ line or somatic variation. An R software package containing the methods described in this paper is freely available at http://genome.dfci.harvard.edu/~tlaframb/PLASQ., Synopsis Human cancer is driven by the acquisition of genomic alterations. These alterations include amplifications and deletions of portions of one or both chromosomes in the cell. The localization of such copy number changes is an important pursuit in cancer genomics research because amplifications frequently harbor cancer-causing oncogenes, while deleted regions often contain tumor-suppressor genes. In this paper the authors present an expectation-maximization-based procedure that, when applied to data from single nucleotide polymorphism arrays, estimates not only total copy number at high resolution across the genome, but also the contribution of each parental chromosome to copy number. Applying this approach to data from over 100 lung cancer samples the authors find that, in essentially all cases, amplification is monoallelic. That is, only one of the two parental chromosomes contributes to the copy number elevation in each amplified region. This phenomenon makes possible the identification of haplotypes, or patterns of single nucleotide polymorphism alleles, that may serve as markers for the tumor-inducing genetic variants being targeted.
- Published
- 2005
47. Ten simple rules for partnering with K–12 teachers to support broader impact goals
- Author
-
Alexa R. Warwick, Angela Kolonich, Kristin M. Bass, Louise S. Mead, Frieda Reichsman, and Russell Schwartz
- Subjects
lcsh:Biology (General) ,ComputingMilieux_COMPUTERSANDEDUCATION ,lcsh:QH301-705.5 - Abstract
Contributing to broader impacts is an important aspect of scientific research. Engaging practicing K–12 teachers as part of a research project can be an effective approach for addressing broader impacts requirements of grants, while also advancing researcher and teacher professional growth. Our focus is on leveraging teachers’ professional expertise to develop science education materials grounded in emerging scientific research. In this paper, we describe ten simple rules for planning, implementing, and evaluating teacher engagement to support the broader impact goals of your research project. These collaborations can lead to the development of instructional materials or activities for students in the classroom or provide science research opportunities for teachers. We share our successes and lessons learned while collaborating with high school biology teachers to create technology-based, instructional materials developed from basic biological research. The rules we describe are applicable across teacher partnerships at any grade level in that they emphasize eliciting and respecting teachers’ professionalism and expertise.
- Published
- 2020
48. Ten simple rules for supporting a temporary online pivot in higher education
- Author
-
Emily Nordmann, Chiara Horlin, Jacqui Hutchison, Jo-Anne Murray, Louise Robson, Michael K. Seery, Jill R. D. MacKay, and Russell Schwartz
- Subjects
lcsh:Biology (General) ,lcsh:QH301-705.5 - Abstract
As continued COVID-19 disruption looks likely across the world, perhaps until 2021, contingency plans are evolving in case of further disruption in the 2020–2021 academic year. This includes delivering face-to-face programs fully online for at least part of the upcoming academic year for new and continuing cohorts. This temporary pivot will necessitate distance teaching and learning across almost every conceivable pedagogy, from fundamental degrees to professionally accredited ones. Each institution, program, and course will have its own myriad of individualized needs; however, there is a common question that unites us all: how do we provide teaching and assessment to students in a manner that is accessible, fair, equitable, and provides the best learning whilst acknowledging the temporary nature of the pivot? No “one size fits all” solution exists, and many of the choices that need to be made will be far from simple; however, this paper provides a starting point and basic principles to facilitate discussions taking place around the globe by balancing what we know from the pedagogy of online learning with the practicalities imposed by this crisis and any future crises.
- Published
- 2020
49. Model diagnostics and refinement for phylodynamic models
- Author
-
Colin J. Worby, Gavin J. Gibson, Bryan T. Grenfell, and Max S. Y. Lau
- Subjects
0301 basic medicine ,Computer science ,Calibration (statistics) ,Genomic data ,Fmd virus ,Diagnostic tools ,Machine learning ,computer.software_genre ,Disease Outbreaks ,03 medical and health sciences ,Cellular and Molecular Neuroscience ,0302 clinical medicine ,Genetics ,Animals ,Humans ,Computer Simulation ,Molecular Biology ,lcsh:QH301-705.5 ,Phylogeny ,Ecology, Evolution, Behavior and Systematics ,Molecular Epidemiology ,Models, Statistical ,Ecology ,business.industry ,Event (computing) ,Computational Biology ,Statistical model ,030104 developmental biology ,Computational Theory and Mathematics ,Sampling distribution ,lcsh:Biology (General) ,Modeling and Simulation ,Viruses ,Artificial intelligence ,Construct (philosophy) ,business ,computer ,030217 neurology & neurosurgery - Abstract
Phylodynamic modelling, which studies the joint dynamics of epidemiological and evolutionary processes, has made significant progress in recent years due to increasingly available genomic data and advances in statistical modelling. These advances have greatly improved our understanding of transmission dynamics of many important pathogens. Nevertheless, there remains a lack of effective, targetted diagnostic tools for systematically detecting model mis-specification. Development of such tools is essential for model criticism, refinement, and calibration. The idea of utilising latent residuals for model assessment has already been exploited in general spatio-temporal epidemiological settings. Specifically, by proposing appropriately designed non-centered, re-parameterizations of a given epidemiological process, one can construct latent residuals with known sampling distributions which can be used to quantify evidence of model mis-specification. In this paper, we extend this idea to formulate a novel model-diagnostic framework for phylodynamic models. Using simulated examples, we show that our framework may effectively detect a particular form of mis-specification in a phylodynamic model, particularly in the event of superspreading. We also exemplify our approach by applying the framework to a dataset describing a local foot-and-mouth (FMD) outbreak in the UK, eliciting strong evidence against the assumption of no within-host-diversity in the outbreak. We further demonstrate that our framework can facilitate model calibration in real-life scenarios, by proposing a within-host-diversity model which appears to offer a better fit to data than one that assumes no within-host-diversity of FMD virus.
- Published
- 2019
50. Identification of pathways associated with chemosensitivity through network embedding
- Author
-
Junmei Cairns, Saurabh Sinha, Jian Peng, Sheng Wang, Liewei Wang, and Edward W. Huang
- Subjects
0301 basic medicine ,Proteomics ,Integrins ,Network embedding ,Gene Identification and Analysis ,Gene Expression ,Genetic Networks ,Biochemistry ,0302 clinical medicine ,Cell Signaling ,Gene expression ,Basic Cancer Research ,Medicine and Health Sciences ,Gene Regulatory Networks ,Vector (molecular biology) ,lcsh:QH301-705.5 ,Genetics ,0303 health sciences ,Ecology ,Pharmaceutics ,Genomics ,Phenotype ,Extracellular Matrix ,Computational Theory and Mathematics ,Oncology ,Modeling and Simulation ,030220 oncology & carcinogenesis ,Identification (biology) ,Protein Interaction Networks ,Cellular Structures and Organelles ,Network Analysis ,Research Article ,Signal Transduction ,Computer and Information Sciences ,Computational biology ,Biology ,Protein–protein interaction ,Biological pathway ,Cellular and Molecular Neuroscience ,03 medical and health sciences ,Cancer Genomics ,Genomic Medicine ,Drug Therapy ,Cell Adhesion ,Humans ,Set (psychology) ,Protein Interactions ,Molecular Biology ,Gene ,Ecology, Evolution, Behavior and Systematics ,030304 developmental biology ,Oncogenic Signaling ,Gene Expression Profiling ,Biology and Life Sciences ,Proteins ,Computational Biology ,Cell Biology ,030104 developmental biology ,lcsh:Biology (General) ,Protein-Protein Interactions ,Drug Resistance, Neoplasm ,Pharmacogenomics ,Drug Screening Assays, Antitumor ,030217 neurology & neurosurgery - Abstract
Basal gene expression levels have been shown to be predictive of cellular response to cytotoxic treatments. However, such analyses do not fully reveal complex genotype- phenotype relationships, which are partly encoded in highly interconnected molecular networks. Biological pathways provide a complementary way of understanding drug response variation among individuals. In this study, we integrate chemosensitivity data from a large-scale pharmacogenomics study with basal gene expression data from the CCLE project and prior knowledge of molecular networks to identify specific pathways mediating chemical response. We first develop a computational method called PACER, which ranks pathways for enrichment in a given set of genes using a novel network embedding method. It examines a molecular network that encodes known gene-gene as well as gene-pathway relationships, and determines a vector representation of each gene and pathway in the same low-dimensional vector space. The relevance of a pathway to the given gene set is then captured by the similarity between the pathway vector and gene vectors. To apply this approach to chemosensitivity data, we identify genes whose basal expression levels in a panel of cell lines are correlated with cytotoxic response to a compound, and then rank pathways for relevance to these response-correlated genes using PACER. Extensive evaluation of this approach on benchmarks constructed from databases of compound target genes and large collections of drug response signatures demonstrates its advantages in identifying compound-pathway associations compared to existing statistical methods of pathway enrichment analysis. The associations identified by PACER can serve as testable hypotheses on chemosensitivity pathways and help further study the mechanisms of action of specific cytotoxic drugs. More broadly, PACER represents a novel technique of identifying enriched properties of any gene set of interest while also taking into account networks of known gene-gene relationships and interactions., Author summary Gene expression levels have been used to study the cellular response to drug treatments. However, analysis of gene expression without considering gene interactions cannot fully reveal complex genotype-phenotype relationships. Biological pathways reveal the interactions among genes, thus providing a complementary way of understanding the drug response variation among individuals. In this paper, we aim to identify pathways that mediate the chemical response of each drug. We used the recently generated CTRP pharmacogenomics data and CCLE basal expression data to identify these pathways. We showed that using the prior knowledge encoded in molecular networks substantially improves pathway identification. In particular, we integrate genes and pathways into a large heterogeneous network in which links are protein-protein interactions and gene-pathway affiliations. We then project this heterogeneous network onto a low-dimensional space, which enables more precise similarity measurements between pathways and drug-response-correlated genes. Extensive experiments on two benchmarks show that our method substantially improved the pathway identification performance by using the molecular networks. More importantly, our method represents a novel technique of identifying enriched properties of any gene set of interest while also taking into account networks of known gene-gene relationships and interactions.
- Published
- 2019
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.