10,544 results
Search Results
202. Generation of Binary Tree-Child phylogenetic networks.
- Author
-
Cardona, Gabriel, Pons, Joan Carles, and Scornavacca, Celine
- Subjects
BOTANY ,PHYSICAL sciences ,BINARY number system ,LIFE sciences ,PLANT anatomy ,GRAPH theory - Abstract
Phylogenetic networks generalize phylogenetic trees by allowing the modelization of events of reticulate evolution. Among the different kinds of phylogenetic networks that have been proposed in the literature, the subclass of binary tree-child networks is one of the most studied ones. However, very little is known about the combinatorial structure of these networks. In this paper we address the problem of generating all possible binary tree-child (BTC) networks with a given number of leaves in an efficient way via reduction/augmentation operations that extend and generalize analogous operations for phylogenetic trees, and are biologically relevant. Since our solution is recursive, this also provides us with a recurrence relation giving an upper bound on the number of such networks. We also show how the operations introduced in this paper can be employed to extend the evolutive history of a set of sequences, represented by a BTC network, to include a new sequence. An implementation in python of the algorithms described in this paper, along with some computational experiments, can be downloaded from . [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
203. Bayesian back-calculation and nowcasting for line list data during the COVID-19 pandemic.
- Author
-
Li, Tenglong and White, Laura F.
- Subjects
COVID-19 pandemic ,PANDEMICS ,COVID-19 ,COMMUNICABLE diseases ,SYMPTOMS ,CASE goods ,TREATMENT delay (Medicine) ,REPRODUCTION - Abstract
Surveillance is critical to mounting an appropriate and effective response to pandemics. However, aggregated case report data suffers from reporting delays and can lead to misleading inferences. Different from aggregated case report data, line list data is a table contains individual features such as dates of symptom onset and reporting for each reported case and a good source for modeling delays. Current methods for modeling reporting delays are not particularly appropriate for line list data, which typically has missing symptom onset dates that are non-ignorable for modeling reporting delays. In this paper, we develop a Bayesian approach that dynamically integrates imputation and estimation for line list data. Specifically, this Bayesian approach can accurately estimate the epidemic curve and instantaneous reproduction numbers, even with most symptom onset dates missing. The Bayesian approach is also robust to deviations from model assumptions, such as changes in the reporting delay distribution or incorrect specification of the maximum reporting delay. We apply the Bayesian approach to COVID-19 line list data in Massachusetts and find the reproduction number estimates correspond more closely to the control measures than the estimates based on the reported curve. Author summary: Interventions meant to control infectious diseases are often determined and judged using surveillance data on the number of new cases of disease. In many diseases, there are substantial delays between the time when an individual is infected or shows symptoms and when the case is actually reported to a public health authority, such as the CDC. This reported data often collects information on symptom onset dates for some individuals. In this paper, we describe a method that imputes missing onset dates for all individuals and recreates the history of the disease progression in a population according to symptom onset dates, which are the best observable proxy available for infection dates. Our method also estimates the instantaneous reproduction number and is robust to many deviations from the assumptions of the model. We show, using a COVID-19 dataset from Massachusetts that our method accurately follows the implementation of control measures in the state. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
204. COVID-19 modeling and non-pharmaceutical interventions in an outpatient dialysis unit.
- Author
-
Jang, Hankyu, Polgreen, Philip M., Segre, Alberto M., and Pemmaraju, Sriram V.
- Subjects
COVID-19 ,COVID-19 pandemic ,MEDICAL personnel ,N95 respirators ,PANDEMICS ,SURGICAL equipment ,VIRAL shedding - Abstract
This paper describes a data-driven simulation study that explores the relative impact of several low-cost and practical non-pharmaceutical interventions on the spread of COVID-19 in an outpatient hospital dialysis unit. The interventions considered include: (i) voluntary self-isolation of healthcare personnel (HCPs) with symptoms; (ii) a program of active syndromic surveillance and compulsory isolation of HCPs; (iii) the use of masks or respirators by patients and HCPs; (iv) improved social distancing among HCPs; (v) increased physical separation of dialysis stations; and (vi) patient isolation combined with preemptive isolation of exposed HCPs. Our simulations show that under conditions that existed prior to the COVID-19 outbreak, extremely high rates of COVID-19 infection can result in a dialysis unit. In simulations under worst-case modeling assumptions, a combination of relatively inexpensive interventions such as requiring surgical masks for everyone, encouraging social distancing between healthcare professionals (HCPs), slightly increasing the physical distance between dialysis stations, and—once the first symptomatic patient is detected—isolating that patient, replacing the HCP having had the most exposure to that patient, and relatively short-term use of N95 respirators by other HCPs can lead to a substantial reduction in both the attack rate and the likelihood of any spread beyond patient zero. For example, in a scenario with R
0 = 3.0, 60% presymptomatic viral shedding, and a dialysis patient being the infection source, the attack rate falls from 87.8% at baseline to 34.6% with this intervention bundle. Furthermore, the likelihood of having no additional infections increases from 6.2% at baseline to 32.4% with this intervention bundle. Author summary: As we write this, the COVID-19 pandemic has essentially taken over the world, with more than 20 million cases spread over 216 countries. A big concern for policy makers all across the world has been the impact of COVID-19 on healthcare systems and whether these systems can cope with the enormous strain placed on them by COVID-19. In this paper, we consider the spread of COVID-19 in a specific healthcare setting: the outpatient dialysis unit. Hemodialysis patients are extremely vulnerable to infections in large part due to multiple immune-system deficiencies associated with renal failure and hemodialysis. Hemodialysis facilities also increase the risk of COVID-19 transmission because each patient is in frequent, close contact with other patients and healthcare personnel. Thus, a dialysis unit can be seen as a microcosm for the worst-case impacts of COVID-19 in a healthcare setting. In this manuscript, we show via high-fidelity modeling and simulations that under pessimistic modeling assumptions, there is a combination of relatively simple, inexpensive, and practical non-pharmaceutical interventions that can substantially lower the impact of COVID-19 in the dialysis unit. Our simulations are based on fine-grained healthcare personnel movement data that we make available for other modelers to use. [ABSTRACT FROM AUTHOR]- Published
- 2021
- Full Text
- View/download PDF
205. Modelling locust foraging: How and why food affects group formation.
- Author
-
Georgiou, Fillipe, Buhl, Jerome, Green, J. E. F., Lamichhane, Bishnu, and Thamwattana, Ngamta
- Subjects
GROUP formation ,LOCUSTS ,PARTIAL differential equations ,LOCAL foods ,MARCHING bands - Abstract
Locusts are short horned grasshoppers that exhibit two behaviour types depending on their local population density. These are: solitarious, where they will actively avoid other locusts, and gregarious where they will seek them out. It is in this gregarious state that locusts can form massive and destructive flying swarms or plagues. However, these swarms are usually preceded by the aggregation of juvenile wingless locust nymphs. In this paper we attempt to understand how the distribution of food resources affect the group formation process. We do this by introducing a multi-population partial differential equation model that includes non-local locust interactions, local locust and food interactions, and gregarisation. Our results suggest that, food acts to increase the maximum density of locust groups, lowers the percentage of the population that needs to be gregarious for group formation, and decreases both the required density of locusts and time for group formation around an optimal food width. Finally, by looking at foraging efficiency within the numerical experiments we find that there exists a foraging advantage to being gregarious. Author summary: Locusts are short horned grass hoppers that live in two diametrically opposed behavioural states. In the first, solitarious, they will actively avoid other locusts, whereas the second, gregarious, they will actively seek them out. It is in this gregarious state that locusts form the recognisable and destructive flying adult swarms. However, prior to swarm formation juvenile flightless locusts will form marching hopper bands and make their way from food source to food source. Predicting where these hopper bands might form is key to controlling locust outbreaks. Research has shown that changes in food distributions can affect the transition from solitarious to gregarious. In this paper we construct a mathematical model of locust-locust and locust-food interactions to investigate how food distributions affect the aggregation of juvenile locusts, termed groups, an important precursor to hopper bands. Our findings suggest that there is an optimal food distribution for group formation and that being gregarious increases a locusts ability to forage when food becomes more patchy. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
206. Multitask learning over shared subspaces.
- Author
-
Menghi, Nicholas, Kacar, Kemal, and Penny, Will
- Subjects
SEQUENTIAL learning ,ARTIFICIAL neural networks ,CONCEPT learning ,HUMAN multitasking ,PSYCHOLOGISTS - Abstract
This paper uses constructs from machine learning to define pairs of learning tasks that either shared or did not share a common subspace. Human subjects then learnt these tasks using a feedback-based approach and we hypothesised that learning would be boosted for shared subspaces. Our findings broadly supported this hypothesis with either better performance on the second task if it shared the same subspace as the first, or positive correlations over task performance for shared subspaces. These empirical findings were compared to the behaviour of a Neural Network model trained using sequential Bayesian learning and human performance was found to be consistent with a minimal capacity variant of this model. Networks with an increased representational capacity, and networks without Bayesian learning, did not show these transfer effects. We propose that the concept of shared subspaces provides a useful framework for the experimental study of human multitask and transfer learning. Author summary: How does knowledge gained from previous experience affect learning of new tasks? This question of "Transfer Learning" has been addressed by teachers, psychologists, and more recently by researchers in the fields of neural networks and machine learning. Leveraging constructs from machine learning, we designed pairs of learning tasks that either shared or did not share a common subspace. We compared the dynamics of transfer learning in humans with those of a multitask neural network model, finding that human performance was consistent with a minimal capacity variant of the model. Learning was boosted in the second task if the same subspace was shared between tasks. Additionally, accuracy between tasks was positively correlated but only when they shared the same subspace. Our results highlight the roles of subspaces, showing how they could act as a learning boost if shared, and be detrimental if not. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
207. netgsa: Fast computation and interactive visualization for topology-based pathway enrichment analysis.
- Author
-
Hellstern, Michael, Ma, Jing, Yue, Kun, and Shojaie, Ali
- Subjects
VISUALIZATION ,STATISTICAL power analysis ,PERSONAL computers ,SOFTWARE development tools - Abstract
Existing software tools for topology-based pathway enrichment analysis are either computationally inefficient, have undesirable statistical power, or require expert knowledge to leverage the methods' capabilities. To address these limitations, we have overhauled NetGSA, an existing topology-based method, to provide a computationally-efficient user-friendly tool that offers interactive visualization. Pathway enrichment analysis for thousands of genes can be performed in minutes on a personal computer without sacrificing statistical power. The new software also removes the need for expert knowledge by directly curating gene-gene interaction information from multiple external databases. Lastly, by utilizing the capabilities of Cytoscape, the new software also offers interactive and intuitive network visualization. Author summary: With the increase in publicly available pathway topology information, topology-based pathway enrichment methods have become effective tools to analyze omics data. While many different methods are available, none are uniformly best. This paper focused on overhauling an existing topology-based method, NetGSA. The three key improvements included dramatically reduced computation time so pathway enrichment can be performed within minutes on a personal computer, integration of publicly available pathway topology databases so users can easily leverage the entire capabilities of the NetGSA method, and facilitating interactive visualization of results through an interface with Cytoscape, a popular network visualization tool. The improved NetGSA was compared to the previous version as well as other similar pathway topology-based methods and achieves competitive statistical power. With these improvements and NetGSA's flexibility to address a diverse set of problems and data types, we believe that the new NetGSA can be a useful tool for practitioners. The most recent software is available on GitHub at https://github.com/mikehellstern/netgsa. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
208. Antibody Watch: Text mining antibody specificity from the literature.
- Author
-
Hsu, Chun-Nan, Chang, Chia-Hui, Poopradubsil, Thamolwan, Lo, Amanda, William, Karen A., Lin, Ko-Wei, Bandrowski, Anita, Ozyurt, Ibrahim Burak, Grethe, Jeffrey S., and Martone, Maryann E.
- Subjects
ANTIBODY specificity ,CARRIER proteins ,PROTEIN expression ,IMMUNOGLOBULINS ,KNOWLEDGE base ,TEST systems ,FLUORESCENT antibody technique - Abstract
Antibodies are widely used reagents to test for expression of proteins and other antigens. However, they might not always reliably produce results when they do not specifically bind to the target proteins that their providers designed them for, leading to unreliable research results. While many proposals have been developed to deal with the problem of antibody specificity, it is still challenging to cover the millions of antibodies that are available to researchers. In this study, we investigate the feasibility of automatically generating alerts to users of problematic antibodies by extracting statements about antibody specificity reported in the literature. The extracted alerts can be used to construct an "Antibody Watch" knowledge base containing supporting statements of problematic antibodies. We developed a deep neural network system and tested its performance with a corpus of more than two thousand articles that reported uses of antibodies. We divided the problem into two tasks. Given an input article, the first task is to identify snippets about antibody specificity and classify if the snippets report that any antibody exhibits non-specificity, and thus is problematic. The second task is to link each of these snippets to one or more antibodies mentioned in the snippet. The experimental evaluation shows that our system can accurately perform the classification task with 0.925 weighted F1-score, linking with 0.962 accuracy, and 0.914 weighted F1 when combined to complete the joint task. We leveraged Research Resource Identifiers (RRID) to precisely identify antibodies linked to the extracted specificity snippets. The result shows that it is feasible to construct a reliable knowledge base about problematic antibodies by text mining. Author summary: Antibodies are widely used reagents to test for the expression of proteins. However, antibodies are also a known source of reproducibility problems in biomedicine, as specificity and other issues can complicate their use. Information about how antibodies perform for specific applications are scattered across the biomedical literature and multiple websites. To alert scientists with reported antibody issues, we develop text mining algorithms that can identify specificity issues reported in the literature. We developed a deep neural network algorithm and performed a feasibility study on 2,223 papers. We leveraged Research Resource Identifiers (RRIDs), unique identifiers for antibodies and other biomedical resources, to match extracted specificity issues with particular antibodies. The results show that our system, called "Antibody Watch," can accurately perform specificity issue identification and RRID association with a weighted F-score over 0.914. From our test corpus, we identified 37 antibodies with 68 nonspecific issue statements. With Antibody Watch, for example, if one were looking for an antibody targeting beta-Amyloid 1–16, from 74 antibodies at dkNET Resource Reports (on 10/2/20), one would be alerted that "some non-specific bands were detected at 55 kDa in both WT and APP/PS1 mice with the 6E10 antibody..." [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
209. What do Eulerian and Hamiltonian cycles have to do with genome assembly?
- Author
-
Medvedev, Paul and Pop, Mihai
- Subjects
DE Bruijn graph ,GENOMES - Abstract
Many students are taught about genome assembly using the dichotomy between the complexity of finding Eulerian and Hamiltonian cycles (easy versus hard, respectively). This dichotomy is sometimes used to motivate the use of de Bruijn graphs in practice. In this paper, we explain that while de Bruijn graphs have indeed been very useful, the reason has nothing to do with the complexity of the Hamiltonian and Eulerian cycle problems. We give 2 arguments. The first is that a genome reconstruction is never unique and hence an algorithm for finding Eulerian or Hamiltonian cycles is not part of any assembly algorithm used in practice. The second is that even if an arbitrary genome reconstruction was desired, one could do so in linear time in both the Eulerian and Hamiltonian paradigms. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
210. Spatio-temporal spread of artemisinin resistance in Southeast Asia.
- Author
-
Flegg, Jennifer A., Kandanaarachchi, Sevvandi, Guerin, Philippe J., Dondorp, Arjen M., Nosten, Francois H., Otienoburu, Sabina Dahlström, and Golding, Nick
- Subjects
ANTIMALARIALS ,ARTEMISININ ,ARTEMISININ derivatives ,DECISION support systems ,PHARMACEUTICAL policy ,MALARIA prevention ,HERBICIDE resistance - Abstract
Current malaria elimination targets must withstand a colossal challenge–resistance to the current gold standard antimalarial drug, namely artemisinin derivatives. If artemisinin resistance significantly expands to Africa or India, cases and malaria-related deaths are set to increase substantially. Spatial information on the changing levels of artemisinin resistance in Southeast Asia is therefore critical for health organisations to prioritise malaria control measures, but available data on artemisinin resistance are sparse. We use a comprehensive database from the WorldWide Antimalarial Resistance Network on the prevalence of non-synonymous mutations in the Kelch 13 (K13) gene, which are known to be associated with artemisinin resistance, and a Bayesian geostatistical model to produce spatio-temporal predictions of artemisinin resistance. Our maps of estimated prevalence show an expansion of the K13 mutation across the Greater Mekong Subregion from 2000 to 2022. Moreover, the period between 2010 and 2015 demonstrated the most spatial change across the region. Our model and maps provide important insights into the spatial and temporal trends of artemisinin resistance in a way that is not possible using data alone, thereby enabling improved spatial decision support systems on an unprecedented fine-scale spatial resolution. By predicting for the first time spatio-temporal patterns and extents of artemisinin resistance at the subcontinent level, this study provides critical information for supporting malaria elimination goals in Southeast Asia. Author summary: Resistance to artemisinin derivatives has been confirmed in the Greater Mekong Subregion, with worrying signs of spread in India and more recently emergence in Rwanda and Uganda. This situation is dire given the way that the emergence and spread of resistance to other antimalarial drugs, chloroquine and later sulphadoxine–pyrimethamine, resulted in dramatic increases in malaria-related morbidity and mortality across sub-Saharan Africa in the 1990s. To eliminate malaria, up-to-date maps of artemisinin resistance are urgently needed; predictive models of the spread of drug resistance can make far-reaching, significant, changes in our approach to malaria elimination by informing appropriate changes to drug policy. In this study, we have provided the first data-driven, predictive maps of the changing landscape of resistance to artemisinin derivatives in the Greater Mekong Subregion. These maps provide estimates where no data are available and can be used by health agencies to guide the prioritisation of surveillance for resistance, and policies to improve treatment and prevent the further spread of resistance. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
211. Modelling cell shape in 3D structured environments: A quantitative comparison with experiments.
- Author
-
Link, Rabea, Jaggy, Mona, Bastmeyer, Martin, and Schwarz, Ulrich S.
- Subjects
CELL morphology ,POTTS model ,GEOMETRIC surfaces ,CELL physiology ,SEPARATION of variables ,CELL migration ,CELL sheets (Biology) - Abstract
Cell shape plays a fundamental role in many biological processes, including adhesion, migration, division and development, but it is not clear which shape model best predicts three-dimensional cell shape in structured environments. Here, we compare different modelling approaches with experimental data. The shapes of single mesenchymal cells cultured in custom-made 3D scaffolds were compared by a Fourier method with surfaces that minimize area under the given adhesion and volume constraints. For the minimized surface model, we found marked differences to the experimentally observed cell shapes, which necessitated the use of more advanced shape models. We used different variants of the cellular Potts model, which effectively includes both surface and bulk contributions. The simulations revealed that the Hamiltonian with linear area energy outperformed the elastic area constraint in accurately modelling the 3D shapes of cells in structured environments. Explicit modelling the nucleus did not improve the accuracy of the simulated cell shapes. Overall, our work identifies effective methods for accurately modelling cellular shapes in complex environments. Author summary: Cell shape and forces have emerged as important determinants of cell function and thus their prediction is essential to describe and control the behaviour of cells in complex environments. While there exist well-established models for the two-dimensional shape of cells on flat substrates, it is less clear how cell shape should be modeled in three dimensions. Different from the philosophy of the vertex model often used for epithelial sheets, we find that models based only on cortical tension as a constant geometrical surface tension are not sufficient to describe the shape of single cells in 3D. Therefore, we employ different variants of the cellular Potts model, where either a target area is prescribed by an elastic constraint or the area energy is described with a linear surface tension. By comparing the simulated shapes to experimental images of cells in 3D scaffolds, we can identify parameters that accurately model 3D cell shape. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
212. For long-term sustainable software in bioinformatics.
- Author
-
Coelho, Luis Pedro
- Subjects
BIOINFORMATICS software ,SCIENTIFIC literature ,PROTEIN structure prediction ,OPEN source software - Abstract
The article discusses the importance of long-term sustainable software in bioinformatics. It highlights the challenges faced by research software, which is often developed by short-term research grants and can become obsolete after publication. The article emphasizes the need for maintenance and support of software tools to ensure their continued functionality and reproducibility. The author presents seven practices that their group follows to achieve long-term software maintenance, including reproducible research techniques, user testing, providing support in public forums, and making software available prior to publication. The article concludes with a call for the research community and institutions to be explicit about the expectations and commitments regarding software maintenance and support. [Extracted from the article]
- Published
- 2024
- Full Text
- View/download PDF
213. Bayesian inference is facilitated by modular neural networks with different time scales.
- Author
-
Ichikawa, Kohei and Kaneko, Kunihiko
- Subjects
BAYESIAN field theory ,MODULAR construction ,RECURRENT neural networks - Abstract
Various animals, including humans, have been suggested to perform Bayesian inferences to handle noisy, time-varying external information. In performing Bayesian inference by the brain, the prior distribution must be acquired and represented by sampling noisy external inputs. However, the mechanism by which neural activities represent such distributions has not yet been elucidated. Our findings reveal that networks with modular structures, composed of fast and slow modules, are adept at representing this prior distribution, enabling more accurate Bayesian inferences. Specifically, the modular network that consists of a main module connected with input and output layers and a sub-module with slower neural activity connected only with the main module outperformed networks with uniform time scales. Prior information was represented specifically by the slow sub-module, which could integrate observed signals over an appropriate period and represent input means and variances. Accordingly, the neural network could effectively predict the time-varying inputs. Furthermore, by training the time scales of neurons starting from networks with uniform time scales and without modular structure, the above slow-fast modular network structure and the division of roles in which prior knowledge is selectively represented in the slow sub-modules spontaneously emerged. These results explain how the prior distribution for Bayesian inference is represented in the brain, provide insight into the relevance of modular structure with time scale hierarchy to information processing, and elucidate the significance of brain areas with slower time scales. Author summary: Bayesian inference is essential for predicting noisy inputs in the environment and is suggested to be common in various animals, including humans. For the brain, to perform Bayesian inference, the prior distribution of the signal must be acquired and represented in the neural networks by sampling noisy inputs to estimate the posterior distribution of signals. By training recurrent neural networks to predict time-varying inputs, we demonstrated that those with modular structures, characterized by the main module exhibiting faster neural activity and the sub-module exhibiting slower neural activity, achieved highly accurate Bayesian inference to perform the required task. In this network, the prior distribution was specifically represented by the slower sub-module, which effectively integrated the earlier inputs. Furthermore, this modular structure with different time scales and division of representing roles emerged spontaneously through the learning process of Bayesian inference. Our results demonstrate a general mechanism for encoding prior distributions and highlight the importance of the brain's modular structure with time scale differentiation for Bayesian information processing. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
214. Robust and consistent measures of pattern separation based on information theory and demonstrated in the dentate gyrus.
- Author
-
Bird, Alexander D., Cuntz, Hermann, and Jedlicka, Peter
- Subjects
DENTATE gyrus ,INFORMATION theory ,ALZHEIMER'S disease ,GRANULE cells ,NEURAL circuitry - Abstract
Pattern separation is a valuable computational function performed by neuronal circuits, such as the dentate gyrus, where dissimilarity between inputs is increased, reducing noise and increasing the storage capacity of downstream networks. Pattern separation is studied from both in vivo experimental and computational perspectives and, a number of different measures (such as orthogonalisation, decorrelation, or spike train distance) have been applied to quantify the process of pattern separation. However, these are known to give conclusions that can differ qualitatively depending on the choice of measure and the parameters used to calculate it. We here demonstrate that arbitrarily increasing sparsity, a noticeable feature of dentate granule cell firing and one that is believed to be key to pattern separation, typically leads to improved classical measures for pattern separation even, inappropriately, up to the point where almost all information about the inputs is lost. Standard measures therefore both cannot differentiate between pattern separation and pattern destruction, and give results that may depend on arbitrary parameter choices. We propose that techniques from information theory, in particular mutual information, transfer entropy, and redundancy, should be applied to penalise the potential for lost information (often due to increased sparsity) that is neglected by existing measures. We compare five commonly-used measures of pattern separation with three novel techniques based on information theory, showing that the latter can be applied in a principled way and provide a robust and reliable measure for comparing the pattern separation performance of different neurons and networks. We demonstrate our new measures on detailed compartmental models of individual dentate granule cells and a dentate microcircuit, and show how structural changes associated with epilepsy affect pattern separation performance. We also demonstrate how our measures of pattern separation can predict pattern completion accuracy. Overall, our measures solve a widely acknowledged problem in assessing the pattern separation of neural circuits such as the dentate gyrus, as well as the cerebellum and mushroom body. Finally we provide a publicly available toolbox allowing for easy analysis of pattern separation in spike train ensembles. Author summary: The hippocampus is a region of the brain strongly associated with spatial navigation and encoding of episodic memories. To perform these functions effectively it makes use of circuits that perform pattern separation, where redundant structure is removed from neural representations leaving only the most salient information. Pattern separation allows downstream pattern completion networks to better distinguish between similar situations. Pathological changes, caused by Alzheimer's, schizophrenia, or epilepsy, to the circuits that perform pattern separation are associated with reduced discriminative ability in both animal models and humans. Traditionally, pattern separation has been described alongside the complementary process of pattern completion, but more recent studies have focussed on the detailed neuronal and circuit features that contribute to pattern separation alone. We here show that traditional measures of pattern separation are inappropriate in this case, as they do not give consistent conclusions when parameters are changed and can confound pattern separation with the loss of important information. We show that directly accounting for the information throughput of a pattern separation circuit can provide new measures of pattern separation that are robust and consistent, and allow for nuanced analysis of the structure-function relationship of such circuits and how this may be perturbed by pathology. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
215. Accounting for isoform expression increases power to identify genetic regulation of gene expression.
- Author
-
LaPierre, Nathan and Pimentel, Harold
- Subjects
GENETIC regulation ,GENE expression ,LOCUS (Genetics) ,GENETIC variation ,ALTERNATIVE RNA splicing ,RNA splicing - Abstract
A core problem in genetics is molecular quantitative trait locus (QTL) mapping, in which genetic variants associated with changes in the molecular phenotypes are identified. One of the most-studied molecular QTL mapping problems is expression QTL (eQTL) mapping, in which the molecular phenotype is gene expression. It is common in eQTL mapping to compute gene expression by aggregating the expression levels of individual isoforms from the same gene and then performing linear regression between SNPs and this aggregated gene expression level. However, SNPs may regulate isoforms from the same gene in different directions due to alternative splicing, or only regulate the expression level of one isoform, causing this approach to lose power. Here, we examine a broader question: which genes have at least one isoform whose expression level is regulated by genetic variants? In this study, we propose and evaluate several approaches to answering this question, demonstrating that "isoform-aware" methods—those that account for the expression levels of individual isoforms—have substantially greater power to answer this question than standard "gene-level" eQTL mapping methods. We identify settings in which different approaches yield an inflated number of false discoveries or lose power. In particular, we show that calling an eGene if there is a significant association between a SNP and any isoform fails to control False Discovery Rate, even when applying standard False Discovery Rate correction. We show that similar trends are observed in real data from the GEUVADIS and GTEx studies, suggesting the possibility that similar effects are present in these consortia. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
216. Compressive stress gradients direct mechanoregulation of anisotropic growth in the zebrafish jaw joint.
- Author
-
Godivier, Josepha, Lawrence, Elizabeth A., Wang, Mengdi, Hammond, Chrissy L., and Nowlan, Niamh C.
- Subjects
MANDIBULAR joint ,FETAL movement ,DYSPLASIA ,BRACHYDANIO ,JOINTS (Anatomy) ,FISH larvae ,COMPRESSION loads ,RANGE of motion of joints - Abstract
Mechanical stimuli arising from fetal movements are critical factors underlying joint growth. Abnormal fetal movements negatively affect joint shape features with important implications for joint health, but the mechanisms by which mechanical forces from fetal movements influence joint growth are still unclear. In this research, we quantify zebrafish jaw joint growth in 3D in free-to-move and immobilised fish larvae between four and five days post fertilisation. We found that the main changes in size and shape in normally moving fish were in the ventrodorsal axis, while growth anisotropy was lost in the immobilised larvae. We next sought to determine the cell level activities underlying mechanoregulated growth anisotropy by tracking individual cells in the presence or absence of jaw movements, finding that the most dramatic changes in growth rates due to jaw immobility were in the ventrodorsal axis. Finally, we implemented mechanobiological simulations of joint growth with which we tested hypotheses relating specific mechanical stimuli to mechanoregulated growth anisotropy. Different types of mechanical stimulation were incorporated into the simulation to provide the mechanoregulated component of growth, in addition to the baseline (non-mechanoregulated) growth which occurs in the immobilised animals. We found that when average tissue stress over the opening and closing cycle of the joint was used as the stimulus for mechanoregulated growth, joint morphogenesis was not accurately predicted. Predictions were improved when using the stress gradients along the rudiment axes (i.e., the variation in magnitude of compression to magnitude of tension between local regions). However, the most accurate predictions were obtained when using the compressive stress gradients (i.e., the variation in compressive stress magnitude) along the rudiment axes. We conclude therefore that the dominant biophysical stimulus contributing to growth anisotropy during early joint development is the gradient of compressive stress experienced along the growth axes under cyclical loading. Author summary: The mechanical forces caused by fetal movements are important for normal development of the skeleton, and in particular for joint shape. Several common developmental musculoskeletal conditions such as developmental dysplasia of the hip and arthrogryposis are associated with reduced or restricted fetal movements. Paediatric joint malformations impair joint function and can be debilitating. To understand the origins of such conditions, it is essential to understand how the mechanical forces arising from movements influence joint growth and shape. In this research, we used a computational model of joint growth applied to the zebrafish jaw joint to study the impact of fetal movements on joint growth. We find that how the amount of compressive loading changes along the rudiment axes and over the loading cycle is critical to the normal growth of the developing joint. Our findings implicate gradients of compressive loading as a promising target when developing therapeutic strategies (such as targeted physiotherapy) for the treatment of musculoskeletal conditions. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
217. A beta-Poisson model for infectious disease transmission.
- Author
-
Hilton, Joe and Hall, Ian
- Subjects
INFECTIOUS disease transmission ,NEGATIVE binomial distribution ,POISSON processes ,RANDOM numbers ,AKAIKE information criterion ,ZOONOSES - Abstract
Outbreaks of emerging and zoonotic infections represent a substantial threat to human health and well-being. These outbreaks tend to be characterised by highly stochastic transmission dynamics with intense variation in transmission potential between cases. The negative binomial distribution is commonly used as a model for transmission in the early stages of an epidemic as it has a natural interpretation as the convolution of a Poisson contact process and a gamma-distributed infectivity. In this study we expand upon the negative binomial model by introducing a beta-Poisson mixture model in which infectious individuals make contacts at the points of a Poisson process and then transmit infection along these contacts with a beta-distributed probability. We show that the negative binomial distribution is a limit case of this model, as is the zero-inflated Poisson distribution obtained by combining a Poisson-distributed contact process with an additional failure probability. We assess the beta-Poisson model's applicability by fitting it to secondary case distributions (the distribution of the number of subsequent cases generated by a single case) estimated from outbreaks covering a range of pathogens and geographical settings. We find that while the beta-Poisson mixture can achieve a closer to fit to data than the negative binomial distribution, it is consistently outperformed by the negative binomial in terms of Akaike Information Criterion, making it a suboptimal choice on parsimonious grounds. The beta-Poisson performs similarly to the negative binomial model in its ability to capture features of the secondary case distribution such as overdispersion, prevalence of superspreaders, and the probability of a case generating zero subsequent cases. Despite this possible shortcoming, the beta-Poisson distribution may still be of interest in the context of intervention modelling since its structure allows for the simulation of measures which change contact structures while leaving individual-level infectivity unchanged, and vice-versa. Author summary: The early stages of epidemics are characterised by dramatic variations in the number of new cases generated by each infectious individual, with some cases generating no new infections and some "superspreading" cases generating disproportionately large numbers of subsequent cases. In this study we introduce a mathematical model based on a two-step interpretation of infectious disease transmission: infectious individuals make a random number of contacts according to some fixed contact distribution and then infect their contacts with an infection probability which is unique to that specific infectious individual. This model has the advantage of generalizing more commonly used models of early epidemic dynamics, while allowing for policy analyses which assess the impact of measures which impact social contact behaviour and infectiousness across contacts separately. We find that while our model performs at least as well as pre-existing models in modelling individual-level capacity to generate new infections, the extra mathematical complexity our model introduces is not justified by commonly-used measures of parsimony. This suggests that our model could be applicable in specific policy settings but does not offer a substantial improvement over past approaches in a purely observational setting. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
218. Foldy: An open-source web application for interactive protein structure analysis.
- Author
-
Roberts, Jacob B., Nava, Alberto A., Pearson, Allison N., Incha, Matthew R., Valencia, Luis E., Ma, Melody, Rao, Abhay, and Keasling, Jay D.
- Subjects
PROTEIN structure ,WEB-based user interfaces ,LIFE sciences ,PROTEIN analysis ,ARTIFICIAL intelligence ,SYNTHETIC biology - Abstract
Foldy is a cloud-based application that allows non-computational biologists to easily utilize advanced AI-based structural biology tools, including AlphaFold and DiffDock. With many deployment options, it can be employed by individuals, labs, universities, and companies in the cloud without requiring hardware resources, but it can also be configured to utilize locally available computers. Foldy enables scientists to predict the structure of proteins and complexes up to 6000 amino acids with AlphaFold, visualize Pfam annotations, and dock ligands with AutoDock Vina and DiffDock. In our manuscript, we detail Foldy's interface design, deployment strategies, and optimization for various user scenarios. We demonstrate its application through case studies including rational enzyme design and analyzing proteins with domains of unknown function. Furthermore, we compare Foldy's interface and management capabilities with other open and closed source tools in the field, illustrating its practicality in managing complex data and computation tasks. Our manuscript underlines the benefits of Foldy as a day-to-day tool for life science researchers, and shows how Foldy can make modern tools more accessible and efficient. Author summary: Foldy is a cloud-based application that enables scientists to use AI-based structural biology tools such as AlphaFold and DiffDock without software expertise. With many different deployment options, it can be set up by individuals, labs, universities, and companies in the cloud with no need for hardware resources. Foldy can predict the structure of proteins and complexes up to 6000 amino acids, visualize Pfam annotations, and dock ligands with AutoDock Vina and DiffDock. Some structures are visible to the public on the Lawrence Berkeley Labs Foldy instance, and can be viewed at https://foldy.lbl.gov. Our manuscript highlights the user interface, deployment options, relative strengths of Foldy compared to existing tools, and some past applications of Foldy. It's an accessible solution for researchers who are not software experts. Many deployment options are possible and we highlight two: one of which can be set up in minutes, and the other can handle the traffic of thousands of users and hundreds of thousands of protein structures and docked ligands. This makes advanced AI-based tools more widely available, paving the way for accelerating life science research. By developing an easy-to-use platform, our work demonstrates that even computationally expensive AI-based tools like AlphaFold can be made accessible to a wide audience. Improvements in the accessibility of computational tools will allow more biologists to more easily apply computational tools to more problems. We are hopeful that Foldy addresses the growing need of making revolutionary computational tools accessible to more researchers. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
219. Automated morphological phenotyping using learned shape descriptors and functional maps: A novel approach to geometric morphometrics.
- Author
-
Thomas, Oshane O., Shen, Hongyu, Raaum, Ryan L., Harcourt-Smith, William E. H., Polk, John D., and Hasegawa-Johnson, Mark
- Subjects
GEOMETRIC approach ,MORPHOMETRICS ,BIOLOGICAL variation ,LIFE sciences ,DIGITIZATION - Abstract
The methods of geometric morphometrics are commonly used to quantify morphology in a broad range of biological sciences. The application of these methods to large datasets is constrained by manual landmark placement limiting the number of landmarks and introducing observer bias. To move the field forward, we need to automate morphological phenotyping in ways that capture comprehensive representations of morphological variation with minimal observer bias. Here, we present Morphological Variation Quantifier (morphVQ), a shape analysis pipeline for quantifying, analyzing, and exploring shape variation in the functional domain. morphVQ uses descriptor learning to estimate the functional correspondence between whole triangular meshes in lieu of landmark configurations. With functional maps between pairs of specimens in a dataset we can analyze and explore shape variation. morphVQ uses Consistent ZoomOut refinement to improve these functional maps and produce a new representation of shape variation, area-based and conformal (angular) latent shape space differences (LSSDs). We compare this new representation of shape variation to shape variables obtained via manual digitization and auto3DGM, an existing approach to automated morphological phenotyping. We find that LSSDs compare favorably to modern 3DGM and auto3DGM while being more computationally efficient. By characterizing whole surfaces, our method incorporates more morphological detail in shape analysis. We can classify known biological groupings, such as Genus affiliation with comparable accuracy. The shape spaces produced by our method are similar to those produced by modern 3DGM and to auto3DGM, and distinctiveness functions derived from LSSDs show us how shape variation differs between groups. morphVQ can capture shape in an automated fashion while avoiding the limitations of manually digitized landmarks, and thus represents a novel and computationally efficient addition to the geometric morphometrics toolkit. Author summary: The quantification of biological shape variation has relied on expert placement of relatively small subsets of landmarks and their analysis using tools of geometric morphometrics (GM). This paper introduces morphVQ, a novel, automated, learning-based approach to shape analysis that approximates the non-rigid correspondence between surface models of bone. With accurate functional correspondence between bones, we can characterize the shape variation within a dataset. Our results demonstrate that morphVQ performs similarly to manual digitization and to an existing automated phenotyping approach, auto3DGM. morphVQ has the advantages of greater computational efficiency and while capturing shape variation directly from surface model representations of bone. We can classify biological shapes to the Genus level with comparable accuracy to previous approaches, and we can demonstrate which aspects of bone shape differ most between groups. The ability to provide comparable accuracy in a Genus level classification with features extracted from morphVQ further guarantees the validity of this approach. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
220. A convenient correspondence between k-mer-based metagenomic distances and phylogenetically-informed β-diversity measures.
- Author
-
Zhai, Hongxuan and Fukuyama, Julia
- Subjects
SHOTGUN sequencing ,METAGENOMICS ,COMMUNITIES ,EUCLIDEAN distance - Abstract
k-mer-based distances are often used to describe the differences between communities in metagenome sequencing studies because of their computational convenience and history of effectiveness. Although k-mer-based distances do not use information about taxon abundances, we show that one class of k-mer distances between metagenomes (the Euclidean distance between k-mer spectra, or EKS distances) are very closely related to a class of phylogenetically-informed β-diversity measures that do explicitly use both the taxon abundances and information about the phylogenetic relationships among the taxa. Furthermore, we show that both of these distances can be interpreted as using certain features of the taxon abundances that are related to the phylogenetic tree. Our results allow practitioners to perform phylogenetically-informed analyses when they only have k-mer data available and provide a theoretical basis for using k-mer spectra with relatively small values of k (on the order of 4-5). They are also useful for analysts who wish to know more of the properties of any method based on k-mer spectra and provide insight into one class of phylogenetically-informed β-diversity measures. Author summary: Microbiologists have two major strategies for understanding the bacterial communities present in the environment: shotgun metagenome sequencing and amplicon sequencing. Both involve taking samples from the environment, extracting DNA from those samples, and sequencing the extracted DNA. They have different strengths and give different kinds of information about the communities. Because they give different kinds of information, methods for analyzing microbiome data tend to be developed for and used on just one kind of study. In this paper, we show a strong relationship between a set of methods for measuring distances between samples in shotgun metagenome sequencing datasets (the k-mer-based distances) and a set of methods for measuring distances between samples in amplicon sequencing datasets (the phylogenetically-informed beta diversity measures). This is a convenient correspondence because k-mer spectra are easier to extract from shotgun metagenome sequencing datasets than the taxon abundances that would be needed to compute the phylogenetically-informed β diversities. Therefore, if an analyst would like to compute phylogenetically-informed distances between communities from a shotgun metagenome sequencing dataset, our results show that they can work directly with the k-mer spectra and not worry about estimating taxon abundances. The results also imply that any of the many methods that are based on k-mer spectra are implicitly using phylogenetic information. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
221. Mechanistic model for human brain metabolism and its connection to the neurovascular coupling.
- Author
-
Sundqvist, Nicolas, Sten, Sebastian, Thompson, Peter, Andersson, Benjamin Jan, Engström, Maria, and Cedersund, Gunnar
- Subjects
BRAIN metabolism ,NUCLEAR magnetic resonance spectroscopy ,OXYGEN in the blood ,LACTATES ,CEREBRAL circulation ,METABOLIC models ,VISUAL perception ,BLOOD flow - Abstract
The neurovascular and neurometabolic couplings (NVC and NMC) connect cerebral activity, blood flow, and metabolism. This interconnection is used in for instance functional imaging, which analyses the blood-oxygen-dependent (BOLD) signal. The mechanisms underlying the NVC are complex, which warrants a model-based analysis of data. We have previously developed a mechanistically detailed model for the NVC, and others have proposed detailed models for cerebral metabolism. However, existing metabolic models are still not fully utilizing available magnetic resonance spectroscopy (MRS) data and are not connected to detailed models for NVC. Therefore, we herein present a new model that integrates mechanistic modelling of both MRS and BOLD data. The metabolic model covers central metabolism, using a minimal set of interactions, and can describe time-series data for glucose, lactate, aspartate, and glutamate, measured after visual stimuli. Statistical tests confirm that the model can describe both estimation data and predict independent validation data, not used for model training. The interconnected NVC model can simultaneously describe BOLD data and can be used to predict expected metabolic responses in experiments where metabolism has not been measured. This model is a step towards a useful and mechanistically detailed model for cerebral blood flow and metabolism, with potential applications in both basic research and clinical applications. Author summary: The neurovascular and neurometabolic couplings are highly central for several clinical imaging techniques since these frequently use blood oxygenation (the BOLD signal) as a proxy for neuronal activity. This relationship is described by the highly complex neurovascular and neurometabolic couplings, which describe the balancing between increased metabolic demand and blood flow, and which involve several cell types and regulatory systems, which all change dynamically over time. While there are previous works that describe the neurovascular coupling in detail, neither we nor others have developed connections to corresponding mechanistic models for the third aspect, the metabolic aspect. Furthermore, magnetic resonance spectroscopy (MRS) data for such modelling readily is available. In this paper we present a minimal mechanistic model that can describe the metabolic response to visual stimuli. The model is trained to describe experimental data for the relative change in metabolic concentrations of several metabolites in the visual cortex during stimulation. The model is also validated against independent validation data, that was not used for model training. Finally, we also connect this metabolic model to a detailed mechanistic model of the neurovascular coupling. Showing that the model can describe both the metabolic response and a neurovascular response simultaneously. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
222. The effects of base rate neglect on sequential belief updating and real-world beliefs.
- Author
-
Ashinoff, Brandon K., Buck, Justin, Woodford, Michael, and Horga, Guillermo
- Subjects
JUDGMENT (Psychology) ,MEDICAL errors ,EMPIRICAL research - Abstract
Base-rate neglect is a pervasive bias in judgment that is conceptualized as underweighting of prior information and can have serious consequences in real-world scenarios. This bias is thought to reflect variability in inferential processes but empirical support for a cohesive theory of base-rate neglect with sufficient explanatory power to account for longer-term and real-world beliefs is lacking. A Bayesian formalization of base-rate neglect in the context of sequential belief updating predicts that belief trajectories should exhibit dynamic patterns of dependence on the order in which evidence is presented and its consistency with prior beliefs. To test this, we developed a novel 'urn-and-beads' task that systematically manipulated the order of colored bead sequences and elicited beliefs via an incentive-compatible procedure. Our results in two independent online studies confirmed the predictions of the sequential base-rate neglect model: people exhibited beliefs that are more influenced by recent evidence and by evidence inconsistent with prior beliefs. We further found support for a noisy-sampling inference model whereby base-rate neglect results from rational discounting of noisy internal representations of prior beliefs. Finally, we found that model-derived indices of base-rate neglect—including noisier prior representation—correlated with propensity for unusual beliefs outside the laboratory. Our work supports the relevance of Bayesian accounts of sequential base-rate neglect to real-world beliefs and hints at strategies to minimize deleterious consequences of this pervasive bias. Author summary: Base-rate neglect is a common bias in judgment, a bias defined by a tendency to underuse older information when forming a new belief. This bias can have serious consequences in the real world. Base-rate neglect is often cited as a source of errors in medical and legal decisions, and in many other socially relevant contexts. Despite its broad societal relevance, it is unclear whether current theories capture the expression of base-rate neglect in sequential belief formation, and perhaps more crucially why people have this bias in the first place. In this paper, we find support for a model that describes how base-rate neglect influences belief formation over time, showing that people behave in a way that matches theoretical predictions. Knowing how base-rate neglect influences beliefs over time suggests possible strategies that could be implemented in the future to minimize its impact. We also find support for a model which may explain why people exhibit base-rate neglect in the first place. This model suggests that people's representation of older information in the brain is noisy and that it is therefore rational to underuse this older information to some extent depending on how noisy or unreliable its representation is. Finally, we show that our measures of base-rate neglect and noise in the representation of older information correlate with variation in real-world belief oddity, suggesting that these models capture belief-formation processes likely to dictate functioning in real-world settings. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
223. Comparing T cell receptor repertoires using optimal transport.
- Author
-
Olson, Branden J., Schattgen, Stefan A., Thomas, Paul G., Bradley, Philip, and Matsen IV, Frederick A.
- Subjects
PROTEIN receptors ,STATISTICAL models - Abstract
The complexity of entire T cell receptor (TCR) repertoires makes their comparison a difficult but important task. Current methods of TCR repertoire comparison can incur a high loss of distributional information by considering overly simplistic sequence- or repertoire-level characteristics. Optimal transport methods form a suitable approach for such comparison given some distance or metric between values in the sample space, with appealing theoretical and computational properties. In this paper we introduce a nonparametric approach to comparing empirical TCR repertoires that applies the Sinkhorn distance, a fast, contemporary optimal transport method, and a recently-created distance between TCRs called TCRdist. We show that our methods identify meaningful differences between samples from distinct TCR distributions for several case studies, and compete with more complicated methods despite minimal modeling assumptions and a simpler pipeline. Author summary: T cells are critical for a successful adaptive immune response, largely due to the expression of highly diverse receptor proteins on their surfaces. These T cell receptors (TCRs) recognize peptides that may be foreign invaders such as viruses or bacteria. Because of this, immunologists are often interested in comparing different sets (or repertoires) of these TCRs in hopes of identifying groups of particular interest, such as TCRs that are responding to a particular vaccination using pre- and post-vaccination samples. Current methods of comparing TCR repertoires either rely on statistical models which may not adequately describe the data, use summary statistics that may lose information, or are difficult to interpret. We present a complementary method of comparing TCR repertoires that detects significantly different TCRs between two given repertoires using a distance rather than a model, summary statistics, or dimension reduction. We demonstrate that our method can identify biologically meaningful repertoire differences using several case studies. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
224. Role of path information in visual perception of joint stiffness.
- Author
-
West Jr., A. Michael, Huber, Meghan E., and Hogan, Neville
- Subjects
MECHANICAL impedance ,POWER transmission ,VISUAL perception ,MOTOR learning ,COMPUTER simulation ,VELOCITY - Abstract
Humans have an astonishing ability to extract hidden information from the movement of others. In previous work, subjects observed the motion of a simulated stick-figure, two-link planar arm and estimated its stiffness. Fundamentally, stiffness is the relation between force and displacement. Given that subjects were unable to physically interact with the simulated arm, they were forced to make their estimates solely based on observed kinematic information. Remarkably, subjects were able to correctly correlate their stiffness estimates with changes in the simulated stiffness, despite the lack of force information. We hypothesized that subjects were only able to do this because the controller used to produce the simulated arm's movement, composed of oscillatory motions driving mechanical impedances, resembled the controller humans use to produce their own movement. However, it is still unknown what motion features subjects used to estimate stiffness. Human motion exhibits systematic velocity-curvature patterns, and it has previously been shown that these patterns play an important role in perceiving and interpreting motion. Thus, we hypothesized that manipulating the velocity profile should affect subjects' ability to estimate stiffness. To test this, we changed the velocity profile of the simulated two-link planar arm while keeping the simulated joint paths the same. Even with manipulated velocity signals, subjects were still able to estimate changes in simulated joint stiffness. However, when subjects were shown the same simulated path with different velocity profiles, they perceived motions that followed a veridical velocity profile to be less stiff than that of a non-veridical profile. These results suggest that path information (displacement) predominates over temporal information (velocity) when humans use visual observation to estimate stiffness. Author summary: Stiffness of the arms or legs, the force evoked by displacement, plays an important role in managing physical interaction with objects in the world. Measuring stiffness fundamentally requires physical contact. Nevertheless, previous study showed that humans have a remarkable ability to estimate stiffness solely from visual observation of a computer simulation, with no physical contact. The present study extended that work and found that this ability was robust. In particular, the ability to estimate simulated stiffness was largely unaffected by changing the time course of simulated motion. This was surprising given the extensive prior research reporting that distorting velocity patterns influences motion perception. The results presented in this paper indicate that geometric information (path) predominates over temporal information (velocity) in the perception of stiffness. Given the highly-cited relationship between motor action and perception, it also suggests that the structure of the motor control system we used in the simulations is a reasonable approximation of the neural motor controller. This work provides insight into humans' representation of motor behavior and how humans interpret and learn from the motor actions of others. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
225. Inference of B cell clonal families using heavy/light chain pairing information.
- Author
-
Ralph, Duncan K. and Matsen IV, Frederick A.
- Subjects
B cells ,B cell receptors ,MONOCLONAL antibodies ,IMMUNE response ,PROBLEM solving - Abstract
Next generation sequencing of B cell receptor (BCR) repertoires has become a ubiquitous tool for understanding the antibody-mediated immune response: it is now common to have large volumes of sequence data coding for both the heavy and light chain subunits of the BCR. However, until the recent development of high throughput methods of preserving heavy/light chain pairing information, these samples contained no explicit information on which heavy chain sequence pairs with which light chain sequence. One of the first steps in analyzing such BCR repertoire samples is grouping sequences into clonally related families, where each stems from a single rearrangement event. Many methods of accomplishing this have been developed, however, none so far has taken full advantage of the newly-available pairing information. This information can dramatically improve clustering performance, especially for the light chain. The light chain has traditionally been challenging for clonal family inference because of its low diversity and consequent abundance of non-clonal families with indistinguishable naive rearrangements. Here we present a method of incorporating this pairing information into the clustering process in order to arrive at a more accurate partition of the data into clonally related families. We also demonstrate two methods of fixing imperfect pairing information, which may allow for simplified sample preparation and increased sequencing depth. Finally, we describe several other improvements to the partis software package. Author summary: Antibodies form part of the adaptive immune response, and are critical to immunity acquired by both vaccination and infection. Next generation sequencing of the B cell receptor (BCR) repertoire provides a broad and highly informative view of the DNA sequences from which antibodies arise. Until recently, however, this sequencing data was not able to pair together the two domains (from separate chromosomes) that make up a functional antibody. In this paper we present several methods to improve analysis of the new paired data that does pair together sequence data for complete antibodies. We first show a method that better groups together sequences stemming from the same ancestral cell, solving a problem called "clonal family inference." We then show two methods that can correct for various imperfections in the data's identification of which sequences pair together to form complete antibodies, which together may allow for significantly simplified experimental methods. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
226. Social dilemmas of sociality due to beneficial and costly contagion.
- Author
-
Cooney, Daniel B., Morris, Dylan H., Levin, Simon A., Rubenstein, Daniel I., and Romanczuk, Pawel
- Subjects
SOCIAL interaction ,SOCIAL contact ,SOCIAL evolution ,DILEMMA ,INNOVATION adoption - Abstract
Levels of sociality in nature vary widely. Some species are solitary; others live in family groups; some form complex multi-family societies. Increased levels of social interaction can allow for the spread of useful innovations and beneficial information, but can also facilitate the spread of harmful contagions, such as infectious diseases. It is natural to assume that these contagion processes shape the evolution of complex social systems, but an explicit account of the dynamics of sociality under selection pressure imposed by contagion remains elusive. We consider a model for the evolution of sociality strategies in the presence of both a beneficial and costly contagion. We study the dynamics of this model at three timescales: using a susceptible-infectious-susceptible (SIS) model to describe contagion spread for given sociality strategies, a replicator equation to study the changing fractions of two different levels of sociality, and an adaptive dynamics approach to study the long-time evolution of the population level of sociality. For a wide range of assumptions about the benefits and costs of infection, we identify a social dilemma: the evolutionarily-stable sociality strategy (ESS) is distinct from the collective optimum—the level of sociality that would be best for all individuals. In particular, the ESS level of social interaction is greater (respectively less) than the social optimum when the good contagion spreads more (respectively less) readily than the bad contagion. Our results shed light on how contagion shapes the evolution of social interaction, but reveals that evolution may not necessarily lead populations to social structures that are good for any or all. Author summary: Social interactions among individuals in animal groups provide a range of evolutionary benefits and risks. On the one hand, social contacts can promote learning and the adoption of innovations; on the other hand, such interactions can expose individuals to the harms of infectious disease. In this paper, we study the evolution of social gregariousness in the presence of both a beneficial and a costly contagion, which are jointly spreading in a population. Assuming that, all else equal, individuals prefer increased exposure to the good contagion and decreased exposure to the bad contagion, we characterize a socially-optimal level of gregariousness that best balances the relative exposure to the two contagions. However, using the mathematical frameworks of replicator equations and adaptive dynamics, we show that evolutionary competition between sociality strategies produces a social dilemma: individuals endeavoring to maximize their fitnesses drive the population to a level of gregariousness at which all individuals are worse off. In some cases, social behavior can disappear entirely—even when any level of gregariousness would be advantageous for the population as a whole. We also propose mechanisms to help overcome the social dilemma, showing how groups can help to establish more efficient levels of social interaction. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
227. HELIOS: High-speed sequence alignment in optics.
- Author
-
Maleki, Ehsan, Akbari Rokn Abadi, Saeedeh, and Koohi, Somayyeh
- Subjects
SEQUENCE alignment ,OPTICS ,AMINO acid sequence ,OPTICAL polarization ,CIRCULAR RNA - Abstract
In response to the imperfections of current sequence alignment methods, originated from the inherent serialism within their corresponding electrical systems, a few optical approaches for biological data comparison have been proposed recently. However, due to their low performance, raised from their inefficient coding scheme, this paper presents a novel all-optical high-throughput method for aligning DNA, RNA, and protein sequences, named HELIOS. The HELIOS method employs highly sophisticated operations to locate character matches, single or multiple mutations, and single or multiple indels within various biological sequences. On the other hand, the HELIOS optical architecture exploits high-speed processing and operational parallelism in optics, by adopting wavelength and polarization of optical beams. For evaluation, the functionality and accuracy of the HELIOS method are approved through behavioral and optical simulation studies, while its complexity and performance are estimated through analytical computation. The accuracy evaluations indicate that the HELIOS method achieves a precise pairwise alignment of two sequences, highly similar to those of Smith-Waterman, Needleman-Wunsch, BLAST, MUSCLE, ClustalW, ClustalΩ, T-Coffee, Kalign, and MAFFT. According to our performance evaluations, the HELIOS optical architecture outperforms all alternative electrical and optical algorithms in terms of processing time and memory requirement, relying on its highly sophisticated method and optical architecture. Moreover, the employed compact coding scheme highly escalates the number of input characters, and hence, it offers reduced time and space complexities, compared to the electrical and optical alternatives. It makes the HELIOS method and optical architecture highly applicable for biomedical applications. Author summary: The character-by-character alignment of two long biological sequences, i.e. DNA, RNA, and protein, is a tedious task, but essential for recognizing homologies, relationships, and variations. In this case, every alteration, including mutations (substitution), and indels (insertion or deletion) is vital and required for many biological developments like diagnosis, medicine, and vaccination. However, the applicability of current sequence alignment methods is limited, specifically in processing time and memory usage, due to their inherent serialism and imperfections of electrical systems, as well as inefficient coding schemes of optical approaches. It approximately leads to quadratic run-time and space requirements in terms of input sequence lengths, becoming an expensive and laborious process for the real-time alignment of large datasets. Hence, proposing a superior alignment method in terms of accuracy, performance, and applicability can promote biological research and developments. Here, we show that we can overcome the long-lasting and challenging problems in sequence alignment procedure by exploiting optics as a novel computing technology. In this manner, we propose a novel method and its optical architecture for alignment of DNA, RNA, and protein sequences by exploiting high-speed processing and operational parallelism in optics. As our simulation studies confirm, it provides an accurate sequence alignment with outperforming the most widely used electrical and optical alternatives in the terms of processing time and memory requirements. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
228. Synaptic reshaping of plastic neuronal networks by periodic multichannel stimulation with single-pulse and burst stimuli.
- Author
-
Kromer, Justus A. and Tass, Peter A.
- Subjects
NEURAL circuitry ,ALZHEIMER'S disease ,DEEP brain stimulation ,PARKINSON'S disease ,OBSESSIVE-compulsive disorder ,BRAIN stimulation ,NEUROPLASTICITY ,ACTION potentials - Abstract
Synaptic dysfunction is associated with several brain disorders, including Alzheimer's disease, Parkinson's disease (PD) and obsessive compulsive disorder (OCD). Utilizing synaptic plasticity, brain stimulation is capable of reshaping synaptic connectivity. This may pave the way for novel therapies that specifically counteract pathological synaptic connectivity. For instance, in PD, novel multichannel coordinated reset stimulation (CRS) was designed to counteract neuronal synchrony and down-regulate pathological synaptic connectivity. CRS was shown to entail long-lasting therapeutic aftereffects in PD patients and related animal models. This is in marked contrast to conventional deep brain stimulation (DBS) therapy, where PD symptoms return shortly after stimulation ceases. In the present paper, we study synaptic reshaping by periodic multichannel stimulation (PMCS) in networks of leaky integrate-and-fire (LIF) neurons with spike-timing-dependent plasticity (STDP). During PMCS, phase-shifted periodic stimulus trains are delivered to segregated neuronal subpopulations. Harnessing STDP, PMCS leads to changes of the synaptic network structure. We found that the PMCS-induced changes of the network structure depend on both the phase lags between stimuli and the shape of individual stimuli. Single-pulse stimuli and burst stimuli with low intraburst frequency down-regulate synapses between neurons receiving stimuli simultaneously. In contrast, burst stimuli with high intraburst frequency up-regulate these synapses. We derive theoretical approximations of the stimulation-induced network structure. This enables us to formulate stimulation strategies for inducing a variety of network structures. Our results provide testable hypotheses for future pre-clinical and clinical studies and suggest that periodic multichannel stimulation may be suitable for reshaping plastic neuronal networks to counteract pathological synaptic connectivity. Furthermore, we provide novel insight on how the stimulus type may affect the long-lasting outcome of conventional DBS. This may strongly impact parameter adjustment procedures for clinical DBS, which, so far, primarily focused on acute effects of stimulation. Author summary: Synaptic dysfunction accompanies several brain disorders, such as Alzheimer's, Parkinson's disease and obsessive compulsive disorder. For therapeutic purposes, stimulation is delivered to disease-related brain areas. Brain stimulation therapies that manipulate synaptic connections in disease-related brain areas may provide long-lasting symptom relief. In this computational and theoretical study, we study periodic multichannel stimulation, a stimulation technique that allows for manipulating selected synaptic populations. Stimulus trains are delivered to multiple neuronal subpopulations in order to trigger neuronal responses. Using a model network of leaky integrate-and-fire neurons with spike-timing-dependent plasticity and theoretical analysis, we show how the relative timings between and the shape of delivered stimuli can be tuned to down-regulate certain synaptic connections while up-regulating others. Single-pulse stimuli triggered precise neuronal responses and were suitable for inducing a variety of synaptic network structures. When burst stimuli were employed, tuning the intraburst frequency allowed for distinguishing between down- and up-regulation of synaptic connections within individual neuronal subpopulations. Our work provides a theoretical basis for selecting suitable stimulation parameters for inducing long-lasting therapeutic effects in patients suffering from neurological disorders. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
229. Rapid Prediction of Bacterial Heterotrophic Fluxomics Using Machine Learning and Constraint Programming.
- Author
-
Wu, Stephen Gang, Wang, Yuxuan, Jiang, Wu, Oyetunde, Tolutola, Yao, Ruilian, Zhang, Xuehong, Shimizu, Kazuyuki, Tang, Yinjie J., and Bao, Forrest Sheng
- Subjects
METABOLIC flux analysis ,SUPPORT vector machines ,CELL metabolism ,MACHINE learning ,STOICHIOMETRY - Abstract
13 C metabolic flux analysis (13 C-MFA) has been widely used to measure in vivo enzyme reaction rates (i.e., metabolic flux) in microorganisms. Mining the relationship between environmental and genetic factors and metabolic fluxes hidden in existing fluxomic data will lead to predictive models that can significantly accelerate flux quantification. In this paper, we present a web-based platform MFlux () that predicts the bacterial central metabolism via machine learning, leveraging data from approximately 10013 C-MFA papers on heterotrophic bacterial metabolisms. Three machine learning methods, namely Support Vector Machine (SVM), k-Nearest Neighbors (k-NN), and Decision Tree, were employed to study the sophisticated relationship between influential factors and metabolic fluxes. We performed a grid search of the best parameter set for each algorithm and verified their performance through 10-fold cross validations. SVM yields the highest accuracy among all three algorithms. Further, we employed quadratic programming to adjust flux profiles to satisfy stoichiometric constraints. Multiple case studies have shown that MFlux can reasonably predict fluxomes as a function of bacterial species, substrate types, growth rate, oxygen conditions, and cultivation methods. Due to the interest of studying model organism under particular carbon sources, bias of fluxome in the dataset may limit the applicability of machine learning models. This problem can be resolved after more papers on13 C-MFA are published for non-model species. [ABSTRACT FROM AUTHOR]- Published
- 2016
- Full Text
- View/download PDF
230. Unraveling the mechanisms of surround suppression in early visual processing.
- Author
-
Li, Yao and Young, Lai-Sang
- Subjects
LATERAL geniculate body ,VISUAL cortex ,NEURAL circuitry ,BIOLOGICAL systems ,MODELS & modelmaking ,SIGNAL processing - Abstract
This paper uses mathematical modeling to study the mechanisms of surround suppression in the primate visual cortex. We present a large-scale neural circuit alistic modeling work are used. The remaining parameters are chosen to produce model outputs that emulate experimentally observed size-tuning curves. Our two main results are: (i) we discovered the character of the long-range connections in Layer 6 responsible for surround effects in the input layers; and (ii) we showed that a net-inhibitory feedback, i.e., feedback that excites I-cells more than E-cells, from Layer 6 to Layer 4 is conducive to producing surround properties consistent with experimental data. These results are obtained through parameter selection and model analysis. The effects of nonlinear recurrent excitation and inhibition are also discussed. A feature that distinguishes our model from previous modeling work on surround suppression is that we have tried to reproduce realistic lengthscales that are crucial for quantitative comparison with data. Due to its size and the large number of unknown parameters, the model is computationally challenging. We demonstrate a strategy that involves first locating baseline values for relevant parameters using a linear model, followed by the introduction of nonlinearities where needed. We find such a methodology effective, and propose it as a possibility in the modeling of complex biological systems. Author summary: The visual cortex is a part of the cortex that processes visual signal. The visual signal from the retina is relayed through the lateral geniculate nucleus (LGN), and enters the primary visual cortex directly. The primary visual cortex consists of many layers. A phenomenon called surround suppression, which means a reduction of neuronal activities in response to the presence of neighboring stimuli, is well-observed in all layers of the primate primary visual cortex. In this paper, we built a large scale mathematical model consisting of the LGN input and tens of thousands of neuron groups in Layer 4 and Layer 6 of the primary visual cortex to study the mechanism of surround suppression. This model is constrained by both anatomical structures and realistic physiological parameters. Remaining unknown parameters are reverse-engineered from experimental data. We showed how the character of the surround suppression depends on the long-range connections in the feedback layer, as well as the feedback current from Layer 6 to Layer 4. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
231. Using Hawkes Processes to model imported and local malaria cases in near-elimination settings.
- Author
-
Unwin, H. Juliette T., Routledge, Isobel, Flaxman, Seth, Rizoiu, Marian-Andrei, Lai, Shengjie, Cohen, Justin, Weiss, Daniel J., Mishra, Swapnil, and Bhatt, Samir
- Subjects
EMERGING infectious diseases ,DISEASE outbreaks ,MALARIA ,INFECTIOUS disease transmission ,COMMUNICABLE diseases - Abstract
Developing new methods for modelling infectious diseases outbreaks is important for monitoring transmission and developing policy. In this paper we propose using semi-mechanistic Hawkes Processes for modelling malaria transmission in near-elimination settings. Hawkes Processes are well founded mathematical methods that enable us to combine the benefits of both statistical and mechanistic models to recreate and forecast disease transmission beyond just malaria outbreak scenarios. These methods have been successfully used in numerous applications such as social media and earthquake modelling, but are not yet widespread in epidemiology. By using domain-specific knowledge, we can both recreate transmission curves for malaria in China and Eswatini and disentangle the proportion of cases which are imported from those that are community based. Author summary: This paper introduces a mathematically well-founded method for infectious disease outbreaks known as Hawkes Processes. These semi-mechanistic models are relatively new to the infectious diseases toolkit and enable us to combine disease specific information such as the infectious profile with statistical rigour to recreate temporal disease transmission. We show that these methods are very suited to modelling malaria in communities close to eliminating malaria—in particular China and Eswatini—where we are able to disentangle the contribution of exogenous (external) transmission and endogenous (person-to-person) transmission. This is particularly important for developing policies when counties are approaching elimination. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
232. Principles for data analysis workflows.
- Author
-
Stoudt, Sara, Vásquez, Váleri N., and Martinez, Ciera C.
- Subjects
DATA analysis ,COMPUTER software development ,WORKFLOW management ,STUDENT research ,UNIVERSITY research ,TRANSMISSION of sound ,WORKFLOW - Abstract
A systematic and reproducible "workflow"—the process that moves a scientific investigation from raw data to coherent research question to insightful contribution—should be a fundamental part of academic data-intensive research practice. In this paper, we elaborate basic principles of a reproducible data analysis workflow by defining 3 phases: the Explore, Refine, and Produce Phases. Each phase is roughly centered around the audience to whom research decisions, methodologies, and results are being immediately communicated. Importantly, each phase can also give rise to a number of research products beyond traditional academic publications. Where relevant, we draw analogies between design principles and established practice in software development. The guidance provided here is not intended to be a strict rulebook; rather, the suggestions for practices and tools to advance reproducible, sound data-intensive analysis may furnish support for both students new to research and current researchers who are new to data-intensive work. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
233. PIEZO1 and the mechanism of the long circulatory longevity of human red blood cells.
- Author
-
Rogers, Simon and Lew, Virgilio L.
- Subjects
ERYTHROCYTES ,CALCIUM-dependent potassium channels ,LONGEVITY ,POTASSIUM channels ,SICKLE cell anemia ,FETAL hemoglobin ,CELL size ,STRAINS & stresses (Mechanics) - Abstract
Human red blood cells (RBCs) have a circulatory lifespan of about four months. Under constant oxidative and mechanical stress, but devoid of organelles and deprived of biosynthetic capacity for protein renewal, RBCs undergo substantial homeostatic changes, progressive densification followed by late density reversal among others, changes assumed to have been harnessed by evolution to sustain the rheological competence of the RBCs for as long as possible. The unknown mechanisms by which this is achieved are the subject of this investigation. Each RBC traverses capillaries between 1000 and 2000 times per day, roughly one transit per minute. A dedicated Lifespan model of RBC homeostasis was developed as an extension of the RCM introduced in the previous paper to explore the cumulative patterns predicted for repetitive capillary transits over a standardized lifespan period of 120 days, using experimental data to constrain the range of acceptable model outcomes. Capillary transits were simulated by periods of elevated cell/medium volume ratios and by transient deformation-induced permeability changes attributed to PIEZO1 channel mediation as outlined in the previous paper. The first unexpected finding was that quantal density changes generated during single capillary transits cease accumulating after a few days and cannot account for the observed progressive densification of RBCs on their own, thus ruling out the quantal hypothesis. The second unexpected finding was that the documented patterns of RBC densification and late reversal could only be emulated by the implementation of a strict time-course of decay in the activities of the calcium and Na/K pumps, suggestive of a selective mechanism enabling the extended longevity of RBCs. The densification pattern over most of the circulatory lifespan was determined by calcium pump decay whereas late density reversal was shaped by the pattern of Na/K pump decay. A third finding was that both quantal changes and pump-decay regimes were necessary to account for the documented lifespan pattern, neither sufficient on their own. A fourth new finding revealed that RBCs exposed to levels of PIEZO1-medited calcium permeation above certain thresholds in the circulation could develop a pattern of early or late hyperdense collapse followed by delayed density reversal. When tested over much reduced lifespan periods the results reproduced the known circulatory fate of irreversible sickle cells, the cell subpopulation responsible for vaso-occlusion and for most of the clinical manifestations of sickle cell disease. Analysis of the results provided an insightful new understanding of the mechanisms driving the changes in RBC homeostasis during circulatory aging in health and disease. Author summary: The average circulatory lifespan of human red blood cells is about four months, amounting to about 200000 capillary transits. Among the many documented age-related changes red cells experience during this long sojourn the most relevant to homeostasis control comprise progressive densification with late density reversal, decline in the activities of calcium and sodium-potassium pumps, and slow inverse changes in their original sodium and potassium contents. Early experimental results have long established the view that these changes result from the cumulative effects of myriad capillary transits. However, many aspects of this process remain inaccessible to in vivo investigation. This prompted us to attempt a modelling approach applying a dedicated extension to our original red cell model. The results relegated the cumulative mechanism to a secondary role and exposed surprising critical roles for the declining patterns of the calcium and sodium-potassium pumps, as if harnessed by evolution to extend the circulatory longevity of cells within volume ranges that enable optimal rheological performance. The mechanism the model revealed implicated complex interactions between PIEZO1, the calcium-activated potassium channel KCNN4, the anion exchanger AE1, and the calcium and sodium-potassium pumps. These studies proved the model potential for exploring red cell homeostasis in health and disease. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
234. Hands-on training about overfitting.
- Author
-
Demšar, Janez and Zupan, Blaž
- Subjects
MACHINE learning ,MOLECULAR biologists ,COMPUTATIONAL biology ,CONCEPT learning ,DATA science - Abstract
Overfitting is one of the critical problems in developing models by machine learning. With machine learning becoming an essential technology in computational biology, we must include training about overfitting in all courses that introduce this technology to students and practitioners. We here propose a hands-on training for overfitting that is suitable for introductory level courses and can be carried out on its own or embedded within any data science course. We use workflow-based design of machine learning pipelines, experimentation-based teaching, and hands-on approach that focuses on concepts rather than underlying mathematics. We here detail the data analysis workflows we use in training and motivate them from the viewpoint of teaching goals. Our proposed approach relies on Orange, an open-source data science toolbox that combines data visualization and machine learning, and that is tailored for education in machine learning and explorative data analysis. Author summary: Every teacher strives for an a-ha moment, a sudden revelation by the student who gained a fundamental insight she will always remember. In the past years, authors of this paper have been tailoring their courses in machine learning to include material that could lead students to such discoveries. We aim to expose machine learning to practitioners–not only computer scientists but also molecular biologists and students of biomedicine, that is, the end-users of bioinformatics' computational approaches. In this article, we lay out a course that aims to teach about overfitting, one of the key concepts in machine learning that needs to be understood, mastered, and avoided in data science applications. We propose a hands-on approach that uses an open-source workflow-based data science toolbox that combines data visualization and machine learning. In the proposed training about overfitting, we first deceive the students, then expose the problem, and finally challenge them to find the solution. In the paper, we present three lessons in overfitting and associated data analysis workflows and motivate the use of introduced computation methods by relating them to concepts conveyed by instructors. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
235. Hybridized distance- and contact-based hierarchical structure modeling for folding soluble and membrane proteins.
- Author
-
Roche, Rahmatullah, Bhattacharya, Sutanu, and Bhattacharya, Debswapna
- Subjects
AMINO acid sequence ,PROTEIN folding ,PROTEIN structure ,MEMBRANE proteins - Abstract
Crystallography and NMR system (CNS) is currently a widely used method for fragment-free ab initio protein folding from inter-residue distance or contact maps. Despite its widespread use in protein structure prediction, CNS is a decade-old macromolecular structure determination system that was originally developed for solving macromolecular geometry from experimental restraints as opposed to predictive modeling driven by interaction map data. As such, the adaptation of the CNS experimental structure determination protocol for ab initio protein folding is intrinsically anomalous that may undermine the folding accuracy of computational protein structure prediction. In this paper, we propose a new CNS-free hierarchical structure modeling method called DConStruct for folding both soluble and membrane proteins driven by distance and contact information. Rigorous experimental validation shows that DConStruct attains much better reconstruction accuracy than CNS when tested with the same input contact map at varying contact thresholds. The hierarchical modeling with iterative self-correction employed in DConStruct scales at a much higher degree of folding accuracy than CNS with the increase in contact thresholds, ultimately approaching near-optimal reconstruction accuracy at higher-thresholded contact maps. The folding accuracy of DConStruct can be further improved by exploiting distance-based hybrid interaction maps at tri-level thresholding, as demonstrated by the better performance of our method in folding free modeling targets from the 12th and 13th rounds of the Critical Assessment of techniques for protein Structure Prediction (CASP) experiments compared to popular CNS- and fragment-based approaches and energy-minimization protocols, some of which even using much finer-grained distance maps than ours. Additional large-scale benchmarking shows that DConStruct can significantly improve the folding accuracy of membrane proteins compared to a CNS-based approach. These results collectively demonstrate the feasibility of greatly improving the accuracy of ab initio protein folding by optimally exploiting the information encoded in inter-residue interaction maps beyond what is possible by CNS. Author summary: Predicting the folded and functional 3-dimensional structure of a protein molecule from its amino acid sequence is of central importance to structural biology. Recently, promising advances have been made in ab initio protein folding due to the reasonably accurate estimation of inter-residue interaction maps at increasingly higher resolutions that range from binary contacts to finer-grained distances. Despite the progress in predicting the interaction maps, approaches for turning the residue-residue interactions projected in these maps into their precise spatial positioning heavily rely on a decade-old experimental structure determination protocol that is not suitable for predictive modeling. This paper presents a new hierarchical structure modeling method, DConStruct, which can better exploit the information encoded in the interaction maps at multiple granularities, from binary contact maps to distance-based hybrid maps at tri-level thresholding, for improved ab initio folding. Multiple large-scale benchmarking experiments show that our proposed method can substantially improve the folding accuracy for both soluble and membrane proteins compared to state-of-the-art approaches. DConStruct is licensed under the GNU General Public License v3 and freely available at https://github.com/Bhattacharya-Lab/DConStruct. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
236. OpenAWSEM with Open3SPN2: A fast, flexible, and accessible framework for large-scale coarse-grained biomolecular simulations.
- Author
-
Lu, Wei, Bueno, Carlos, Schafer, Nicholas P., Moller, Joshua, Jin, Shikai, Chen, Xun, Chen, Mingchen, Gu, Xinyu, Davtyan, Aram, de Pablo, Juan J., and Wolynes, Peter G.
- Subjects
GRAPHICS processing units ,MOLECULAR dynamics ,PROTEIN folding ,BIOPHYSICS ,SOFTWARE frameworks ,TRANSCRIPTION factors - Abstract
We present OpenAWSEM and Open3SPN2, new cross-compatible implementations of coarse-grained models for protein (AWSEM) and DNA (3SPN2) molecular dynamics simulations within the OpenMM framework. These new implementations retain the chemical accuracy and intrinsic efficiency of the original models while adding GPU acceleration and the ease of forcefield modification provided by OpenMM's Custom Forces software framework. By utilizing GPUs, we achieve around a 30-fold speedup in protein and protein-DNA simulations over the existing LAMMPS-based implementations running on a single CPU core. We showcase the benefits of OpenMM's Custom Forces framework by devising and implementing two new potentials that allow us to address important aspects of protein folding and structure prediction and by testing the ability of the combined OpenAWSEM and Open3SPN2 to model protein-DNA binding. The first potential is used to describe the changes in effective interactions that occur as a protein becomes partially buried in a membrane. We also introduced an interaction to describe proteins with multiple disulfide bonds. Using simple pairwise disulfide bonding terms results in unphysical clustering of cysteine residues, posing a problem when simulating the folding of proteins with many cysteines. We now can computationally reproduce Anfinsen's early Nobel prize winning experiments by using OpenMM's Custom Forces framework to introduce a multi-body disulfide bonding term that prevents unphysical clustering. Our protein-DNA simulations show that the binding landscape is funneled towards structures that are quite similar to those found using experiments. In summary, this paper provides a simulation tool for the molecular biophysics community that is both easy to use and sufficiently efficient to simulate large proteins and large protein-DNA systems that are central to many cellular processes. These codes should facilitate the interplay between molecular simulations and cellular studies, which have been hampered by the large mismatch between the time and length scales accessible to molecular simulations and those relevant to cell biology. Author summary: The cell's most important pieces of machinery are large complexes of proteins often along with nucleic acids. From the ribosome, to CRISPR-Cas9, to transcription factors and DNA-wrangling proteins like the SMC-Kleisins, these complexes allow organisms to replicate and enable cells to respond to environmental cues. Computer simulation is a key technology that can be used to connect physical theories with biological reality. Unfortunately, the time and length scales accessible to molecular simulation have not kept pace with our ambition to study the cell's molecular factories. Many simulation codes also unfortunately remain effectively locked away from the user community who need to modify them as more of the underlying physics is learned. In this paper, we present OpenAWSEM and Open3SPN2, two new easy-to-use and easy to modify implementations of efficient and accurate coarse-grained protein and DNA simulation forcefields that can now be run hundreds of times faster than before, thereby making studies of large biomolecular machines more facile. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
237. A mesoscopic simulator to uncover heterogeneity and evolutionary dynamics in tumors.
- Author
-
Jiménez-Sánchez, Juan, Martínez-Rubio, Álvaro, Popov, Anton, Pérez-Beteta, Julián, Azimzade, Youness, Molina-García, David, Belmonte-Beitia, Juan, Calvo, Gabriel F., and Pérez-García, Víctor M.
- Subjects
PARTIAL differential equations ,HETEROGENEITY ,SPATIOTEMPORAL processes ,TUMORS ,CANCER invasiveness - Abstract
Increasingly complex in silico modeling approaches offer a way to simultaneously access cancerous processes at different spatio-temporal scales. High-level models, such as those based on partial differential equations, are computationally affordable and allow large tumor sizes and long temporal windows to be studied, but miss the discrete nature of many key underlying cellular processes. Individual-based approaches provide a much more detailed description of tumors, but have difficulties when trying to handle full-sized real cancers. Thus, there exists a trade-off between the integration of macroscopic and microscopic information, now widely available, and the ability to attain clinical tumor sizes. In this paper we put forward a stochastic mesoscopic simulation framework that incorporates key cellular processes during tumor progression while keeping computational costs to a minimum. Our framework captures a physical scale that allows both the incorporation of microscopic information, tracking the spatio-temporal emergence of tumor heterogeneity and the underlying evolutionary dynamics, and the reconstruction of clinically sized tumors from high-resolution medical imaging data, with the additional benefit of low computational cost. We illustrate the functionality of our modeling approach for the case of glioblastoma, a paradigm of tumor heterogeneity that remains extremely challenging in the clinical setting. Author summary: Computer simulation based on mathematical models provides a way to improve the understanding of complex processes in oncology. In this paper we develop a stochastic mesoscopic simulation approach that incorporates key cellular processes while keeping computational costs to a minimum. Our methodology captures the development of tumor heterogeneity and the underlying evolutionary dynamics. The physical scale considered allows microscopic information to be included, tracking the spatio-temporal evolution of tumor heterogeneity and reconstructing clinically sized tumors from high-resolution medical imaging data, with a low computational cost. We illustrate the functionality of the modeling approach for the case of glioblastoma, an epitome of heterogeneity in tumors. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
238. Towards Reproducible Descriptions of Neuronal Network Models.
- Author
-
Nordlie, Eilen, Gewaltig, Marc-Oliver, and Plesser, Hans Ekkehard
- Subjects
BIOLOGICAL neural networks ,NEURAL stimulation ,COMPUTATIONAL complexity ,NEUROSCIENCES ,COMPUTATIONAL biology - Abstract
Progress in science depends on the effective exchange of ideas among scientists. New ideas can be assessed and criticized in a meaningful manner only if they are formulated precisely. This applies to simulation studies as well as to experiments and theories. But after more than 50 years of neuronal network simulations, we still lack a clear and common understanding of the role of computational models in neuroscience as well as established practices for describing network models in publications. This hinders the critical evaluation of network models as well as their re-use. We analyze here 14 research papers proposing neuronal network models of different complexity and find widely varying approaches to model descriptions, with regard to both the means of description and the ordering and placement of material. We further observe great variation in the graphical representation of networks and the notation used in equations. Based on our observations, we propose a good model description practice, composed of guidelines for the organization of publications, a checklist for model descriptions, templates for tables presenting model structure, and guidelines for diagrams of networks. The main purpose of this good practice is to trigger a debate about the communication of neuronal network models in a manner comprehensible to humans, as opposed to machine-readable model description languages. We believe that the good model description practice proposed here, together with a number of other recent initiatives on data-, model-, and softwaresharing, may lead to a deeper and more fruitful exchange of ideas among computational neuroscientists in years to come. We further hope that work on standardized ways of describing—and thinking about—complex neuronal networks will lead the scientific community to a clearer understanding of high-level concepts in network dynamics, and will thus lead to deeper insights into the function of the brain. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
239. Reconciling kinetic and thermodynamic models of bacterial transcription.
- Author
-
Morrison, Muir, Razo-Mejia, Manuel, and Phillips, Rob
- Subjects
STATISTICAL equilibrium ,CHEMICAL kinetics ,BINDING sites ,STATISTICAL mechanics ,GENE expression ,FLAVOR - Abstract
The study of transcription remains one of the centerpieces of modern biology with implications in settings from development to metabolism to evolution to disease. Precision measurements using a host of different techniques including fluorescence and sequencing readouts have raised the bar for what it means to quantitatively understand transcriptional regulation. In particular our understanding of the simplest genetic circuit is sufficiently refined both experimentally and theoretically that it has become possible to carefully discriminate between different conceptual pictures of how this regulatory system works. This regulatory motif, originally posited by Jacob and Monod in the 1960s, consists of a single transcriptional repressor binding to a promoter site and inhibiting transcription. In this paper, we show how seven distinct models of this so-called simple-repression motif, based both on thermodynamic and kinetic thinking, can be used to derive the predicted levels of gene expression and shed light on the often surprising past success of the thermodynamic models. These different models are then invoked to confront a variety of different data on mean, variance and full gene expression distributions, illustrating the extent to which such models can and cannot be distinguished, and suggesting a two-state model with a distribution of burst sizes as the most potent of the seven for describing the simple-repression motif. Author summary: With the advent of new technologies allowing us to query biological activity with ever increasing precision, the deluge of quantitative biological data demands quantitative models. Transcriptional regulation—a feature that lies at the core of our understanding of cellular control in myriad context ranging from development to disease—is no exception, with single-cell and single-molecule techniques being routinely deployed to study cellular decision making. These data have served as a fertile proving ground to test models of transcription that mainly come in two flavors: thermodynamic models (based on equilibrium statistical mechanics) and kinetic models (based on chemical kinetics). In this paper we study the correspondence between these theoretical frameworks in the context of the simple repression motif, a common regulatory architecture in prokaryotes in which a repressor with a single binding site regulates expression. We explore the consequences of different levels of coarse-graining of the molecular steps involved in transcription, finding that, at the level of mean gene expression, the different models cannot be distinguished. We then study higher moments of the gene expression distribution which allows us to discard several of the models that disagree with experimental data and supporting a minimal kinetic model. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
240. A novel virtual screening procedure identifies Pralatrexate as inhibitor of SARS-CoV-2 RdRp and it reduces viral replication in vitro.
- Author
-
Zhang, Haiping, Yang, Yang, Li, Junxin, Wang, Min, Saravanan, Konda Mani, Wei, Jinli, Tze-Yang Ng, Justin, Tofazzal Hossain, Md., Liu, Maoxuan, Zhang, Huiling, Ren, Xiaohu, Pan, Yi, Peng, Yin, Shi, Yi, Wan, Xiaochun, Liu, Yingxia, and Wei, Yanjie
- Subjects
SARS-CoV-2 ,ANTIVIRAL agents ,AZITHROMYCIN ,RNA replicase ,VIRAL replication ,COVID-19 ,MOLECULAR dynamics - Abstract
The spread of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus poses serious threats to the global public health and leads to worldwide crisis. No effective drug or vaccine is readily available. The viral RNA-dependent RNA polymerase (RdRp) is a promising therapeutic target. A hybrid drug screening procedure was proposed and applied to identify potential drug candidates targeting RdRp from 1906 approved drugs. Among the four selected market available drug candidates, Pralatrexate and Azithromycin were confirmed to effectively inhibit SARS-CoV-2 replication in vitro with EC
50 values of 0.008μM and 9.453 μM, respectively. For the first time, our study discovered that Pralatrexate is able to potently inhibit SARS-CoV-2 replication with a stronger inhibitory activity than Remdesivir within the same experimental conditions. The paper demonstrates the feasibility of fast and accurate anti-viral drug screening for inhibitors of SARS-CoV-2 and provides potential therapeutic agents against COVID-19. Author summary: Currently, a novel coronavirus called SARS-COV-2 is spreading across many parts of the world. Unfortunately, there is still a lack of effective drugs against the virus. Additionally, it usually takes much longer time to develop a new drug using traditional methods. Thus, it is now better to rely on some alternative methods to develop drugs that can treat such a disease effectively. In this paper, we have proposed a deep learning and molecular dynamics simulation based hybrid drug screening procedure for identifying potential drug candidates targeting RdRp from 1906 market available drugs. Our screening have successfully identified a FDA-approved drug called Pralatrexate that strongly inhibits the replication of 2019-nCoV in vitro with EC50 values of 0.008μM. This work demonstrated the feasibility of accurate virtual drug screening for inhibitors of SARS-CoV-2 and provides potential therapeutic agents against COVID-19. [ABSTRACT FROM AUTHOR]- Published
- 2020
- Full Text
- View/download PDF
241. Breaking the circularity in circular analyses: Simulations and formal treatment of the flattened average approach.
- Author
-
Bowman, Howard, Brooks, Joseph L., Hajilou, Omid, Zoumpoulaki, Alexia, and Litvak, Vladimir
- Subjects
SCIENTIFIC literature ,STATISTICAL power analysis ,MATHEMATICAL proofs ,FALSE positive error ,COGNITIVE neuroscience ,COGNITIVE psychology - Abstract
There has been considerable debate and concern as to whether there is a replication crisis in the scientific literature. A likely cause of poor replication is the multiple comparisons problem. An important way in which this problem can manifest in the M/EEG context is through post hoc tailoring of analysis windows (a.k.a. regions-of-interest, ROIs) to landmarks in the collected data. Post hoc tailoring of ROIs is used because it allows researchers to adapt to inter-experiment variability and discover novel differences that fall outside of windows defined by prior precedent, thereby reducing Type II errors. However, this approach can dramatically inflate Type I error rates. One way to avoid this problem is to tailor windows according to a contrast that is orthogonal (strictly parametrically orthogonal) to the contrast being tested. A key approach of this kind is to identify windows on a fully flattened average. On the basis of simulations, this approach has been argued to be safe for post hoc tailoring of analysis windows under many conditions. Here, we present further simulations and mathematical proofs to show exactly why the Fully Flattened Average approach is unbiased, providing a formal grounding to the approach, clarifying the limits of its applicability and resolving published misconceptions about the method. We also provide a statistical power analysis, which shows that, in specific contexts, the fully flattened average approach provides higher statistical power than Fieldtrip cluster inference. This suggests that the Fully Flattened Average approach will enable researchers to identify more effects from their data without incurring an inflation of the false positive rate. Author summary: It is clear from recent replicability studies that the replication rate in psychology and cognitive neuroscience is not high. One reason for this is that the noise in high dimensional neuroimaging data sets can "look-like" signal. A classic manifestation would be selecting a region in the data volume where an effect is biggest and then specifically reporting results on that region. There is a key trade-off in the selection of such regions of interest: liberal selection will inflate false positive rates, but conservative selection (e.g. strictly on the basis of prior precedent in the literature) can reduce statistical power, causing real effects to be missed. We propose a means to reconcile these two possibilities, by which regions of interest can be tailored to the pattern in the collected data, while not inflating false-positive rates. This is based upon generating what we call the Flattened Average. Critically, we validate the correctness of this method both in (ground-truth) simulations and with formal mathematical proofs. Given the replication "crisis", there may be no more important issue in psychology and cognitive neuroscience than improving the application of methods. This paper makes a valuable contribution to this improvement. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
242. Ten simple rules for developing good reading habits during graduate school and beyond.
- Author
-
Méndez, Marcos
- Subjects
READING ,HABIT formation ,COMPREHENSION ,HISTORY ,PERIODICALS ,BIBLIOGRAPHICAL citations - Abstract
The author talks about several rules that a person can follow to develop good reading habits in graduate school and beyond. Topics discussed include the importance of developing the habit of reading on a daily basis; the need to develop comprehension skills; and the need to study the history of one's discipline. Also mentioned are the importance of creating a list of relevant journals, the need to read books, and the benefits of using a reference manager.
- Published
- 2018
- Full Text
- View/download PDF
243. A modeling study of budding yeast colony formation and its relationship to budding pattern and aging.
- Author
-
Wang, Yanli, Lo, Wing-Cheong, and Chou, Ching-Shan
- Subjects
YEAST fungi genetics ,BUDDING (Zoology) ,ELECTRIC properties of cells ,HAPLOIDY ,DIPLOIDY - Abstract
Budding yeast, which undergoes polarized growth during budding and mating, has been a useful model system to study cell polarization. Bud sites are selected differently in haploid and diploid yeast cells: haploid cells bud in an axial manner, while diploid cells bud in a bipolar manner. While previous studies have been focused on the molecular details of the bud site selection and polarity establishment, not much is known about how different budding patterns give rise to different functions at the population level. In this paper, we develop a two-dimensional agent-based model to study budding yeast colonies with cell-type specific biological processes, such as budding, mating, mating type switch, consumption of nutrients, and cell death. The model demonstrates that the axial budding pattern enhances mating probability at an early stage and the bipolar budding pattern improves colony development under nutrient limitation. Our results suggest that the frequency of mating type switch might control the trade-off between diploidization and inbreeding. The effect of cellular aging is also studied through our model. Based on the simulations, colonies initiated by an aged haploid cell show declined mating probability at an early stage and recover as the rejuvenated offsprings become the majority. Colonies initiated with aged diploid cells do not show disadvantage in colony expansion possibly due to the fact that young cells contribute the most to colony expansion. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
244. Mutual influence between language and perception in multi-agent communication games.
- Author
-
Ohmer, Xenia, Marino, Michael, Franke, Michael, and König, Peter
- Subjects
ARTIFICIAL neural networks ,VISUAL perception ,REINFORCEMENT learning ,VISUAL learning ,DEEP learning ,COGNITION - Abstract
Language interfaces with many other cognitive domains. This paper explores how interactions at these interfaces can be studied with deep learning methods, focusing on the relation between language emergence and visual perception. To model the emergence of language, a sender and a receiver agent are trained on a reference game. The agents are implemented as deep neural networks, with dedicated vision and language modules. Motivated by the mutual influence between language and perception in cognition, we apply systematic manipulations to the agents' (i) visual representations, to analyze the effects on emergent communication, and (ii) communication protocols, to analyze the effects on visual representations. Our analyses show that perceptual biases shape semantic categorization and communicative content. Conversely, if the communication protocol partitions object space along certain attributes, agents learn to represent visual information about these attributes more accurately, and the representations of communication partners align. Finally, an evolutionary analysis suggests that visual representations may be shaped in part to facilitate the communication of environmentally relevant distinctions. Aside from accounting for co-adaptation effects between language and perception, our results point out ways to modulate and improve visual representation learning and emergent communication in artificial agents. Author summary: Language is grounded in the world and used to coordinate and achieve common objectives. We simulate grounded, interactive language use with a communication game. A sender refers to an object in the environment and if the receiver selects the correct object both agents are rewarded. By practicing the game, the agents develop their own communication protocol. We use this setup to study interactions between emerging language and visual perception. Agents are implemented as neural networks with dedicated vision modules to process images of objects. By manipulating their visual representations we can show how variations in perception are reflected in linguistic variations. Conversely, we demonstrate that differences in language are reflected in the agents' visual representations. Our simulations mirror several empirically observed phenomena: labels for concrete objects and properties (e.g., "striped", "bowl") group together visually similar objects, object representations adapt to the categories imposed by language, and representational spaces between communication partners align. In addition, an evolutionary analysis suggests that visual representations may be shaped, in part, to facilitate communication about environmentally relevant information. In sum, we use communication games with neural network agents to model co-adaptation effects between language and visual perception. Future work could apply this computational framework to other interfaces between language and cognition. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
245. Novel methods for estimating the instantaneous and overall COVID-19 case fatality risk among care home residents in England.
- Author
-
Overton, Christoper E., Webb, Luke, Datta, Uma, Fursman, Mike, Hardstaff, Jo, Hiironen, Iina, Paranthaman, Karthik, Riley, Heather, Sedgwick, James, Verne, Julia, Wilner, Steve, Pellis, Lorenzo, and Hall, Ian
- Subjects
COVID-19 pandemic ,NURSING care facilities ,FRAIL elderly ,MEDICAL quality control ,NURSING home care ,COMMUNITIES - Abstract
The COVID-19 pandemic has had high mortality rates in the elderly and frail worldwide, particularly in care homes. This is driven by the difficulty of isolating care homes from the wider community, the large population sizes within care facilities (relative to typical households), and the age/frailty of the residents. To quantify the mortality risk posed by disease, the case fatality risk (CFR) is an important tool. This quantifies the proportion of cases that result in death. Throughout the pandemic, CFR amongst care home residents in England has been monitored closely. To estimate CFR, we apply both novel and existing methods to data on deaths in care homes, collected by Public Health England and the Care Quality Commission. We compare these different methods, evaluating their relative strengths and weaknesses. Using these methods, we estimate temporal trends in the instantaneous CFR (at both daily and weekly resolutions) and the overall CFR across the whole of England, and dis-aggregated at regional level. We also investigate how the CFR varies based on age and on the type of care required, dis-aggregating by whether care homes include nursing staff and by age of residents. This work has contributed to the summary of measures used for monitoring the UK epidemic. Author summary: During an epidemic, the case fatality risk (CFR), i.e. the probability that an individual dies after testing positive for a disease, is a key parameter informing the public health response. However, calculating the CFR is not trivial, since there are cases who may die in the future but have not died yet. Therefore, statistical methods are required to correct for the distribution of times between testing positive and dying. In this paper, we derive multiple methods, some existing and some novel, within a consistent methodological framework. This allows us to understand how these different approaches are related and their relative strengths and weaknesses. During the COVID-19 pandemic, care homes have been particularly affected, due to the high risk of COVID-19-associated mortality in the frail and elderly. We apply our CFR methods to data from English care homes to analyse changes in the care home CFR throughout the pandemic. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
246. CNAViz: An interactive webtool for user-guided segmentation of tumor DNA sequencing data.
- Author
-
Lalani, Zubair, Chu, Gillian, Hsu, Silas, Kagawa, Shaw, Xiang, Michael, Zaccaria, Simone, and El-Kebir, Mohammed
- Subjects
DNA sequencing ,CRITICAL currents ,QUALITY control ,BREAST cancer ,CANCER research ,NUCLEOTIDE sequencing ,GENOME editing - Abstract
Copy-number aberrations (CNAs) are genetic alterations that amplify or delete the number of copies of large genomic segments. Although they are ubiquitous in cancer and, thus, a critical area of current cancer research, CNA identification from DNA sequencing data is challenging because it requires partitioning of the genome into complex segments with the same copy-number states that may not be contiguous. Existing segmentation algorithms address these challenges either by leveraging the local information among neighboring genomic regions, or by globally grouping genomic regions that are affected by similar CNAs across the entire genome. However, both approaches have limitations: overclustering in the case of local segmentation, or the omission of clusters corresponding to focal CNAs in the case of global segmentation. Importantly, inaccurate segmentation will lead to inaccurate identification of CNAs. For this reason, most pan-cancer research studies rely on manual procedures of quality control and anomaly correction. To improve copy-number segmentation, we introduce CNAViz, a web-based tool that enables the user to simultaneously perform local and global segmentation, thus overcoming the limitations of each approach. Using simulated data, we demonstrate that by several metrics, CNAViz allows the user to obtain more accurate segmentation relative to existing local and global segmentation methods. Moreover, we analyze six bulk DNA sequencing samples from three breast cancer patients. By validating with parallel single-cell DNA sequencing data from the same samples, we show that by using CNAViz, our user was able to obtain more accurate segmentation and improved accuracy in downstream copy-number calling. Author summary: Copy-number aberrations (CNAs) are large genetic alterations that are pervasive in cancer and, therefore, have been the focus of several cancer research studies. Copy-number segmentation is a key step in the process of CNA identification, which consist in partitioning the genome into genomic segments with the same copy-number state. However, segmentation is challenging and the limitations of current segmentation algorithms lead to inaccuracies in the characterization of CNAs. In this paper, we introduce CNAViz, an interactive web-based tool that enables the user to edit segmentation solutions and overcome current limitations. We demonstrate the ability of a user to use CNAViz to improve segmentation solutions on both simulated and real data, analyzing six published bulk DNA sequencing samples from three breast cancer patients. Finally, we demonstrate that these improvements in segmentation solutions improve accuracy in downstream copy-number calling, enabling more accurate analyses of intra-tumor heterogeneity. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
247. Correcting the hebbian mistake: Toward a fully error-driven hippocampus.
- Author
-
Zheng, Yicong, Liu, Xiaonan L., Nishiyama, Satoru, Ranganath, Charan, and O'Reilly, Randall C.
- Subjects
THETA rhythm ,DENTATE gyrus ,HIPPOCAMPUS (Brain) ,ENTORHINAL cortex ,EPISODIC memory ,RETRIEVAL practice - Abstract
The hippocampus plays a critical role in the rapid learning of new episodic memories. Many computational models propose that the hippocampus is an autoassociator that relies on Hebbian learning (i.e., "cells that fire together, wire together"). However, Hebbian learning is computationally suboptimal as it does not learn in a way that is driven toward, and limited by, the objective of achieving effective retrieval. Thus, Hebbian learning results in more interference and a lower overall capacity. Our previous computational models have utilized a powerful, biologically plausible form of error-driven learning in hippocampal CA1 and entorhinal cortex (EC) (functioning as a sparse autoencoder) by contrasting local activity states at different phases in the theta cycle. Based on specific neural data and a recent abstract computational model, we propose a new model called Theremin (Total Hippocampal ERror MINimization) that extends error-driven learning to area CA3—the mnemonic heart of the hippocampal system. In the model, CA3 responds to the EC monosynaptic input prior to the EC disynaptic input through dentate gyrus (DG), giving rise to a temporal difference between these two activation states, which drives error-driven learning in the EC→CA3 and CA3↔CA3 projections. In effect, DG serves as a teacher to CA3, correcting its patterns into more pattern-separated ones, thereby reducing interference. Results showed that Theremin, compared with our original Hebbian-based model, has significantly increased capacity and learning speed. The model makes several novel predictions that can be tested in future studies. Author summary: Exemplified by the famous case of patient H.M. (Henry Molaison) whose hippocampus was surgically removed, the hippocampus is critical for learning and remembering everyday events—what is typically called "episodic memory." The dominant theory for how it learns is based on the intuitive principle stated by Donald Hebb in 1949, that neurons that "fire together, wire together"—when two neurons are active at the same time, the strength of their connection increases. We show in this paper that using a different form of learning based on correcting errors (error-driven learning) results in significantly improved episodic memory function in a biologically-based computational model of the hippocampus. This model also provides a significantly better account of behavioral data on the testing effect, where learning by testing with partial cues is better than learning with the complete set of information. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
248. Probabilistic edge weights fine-tune Boolean network dynamics.
- Author
-
Deritei, Dávid, Kunšič, Nina, and Csermely, Péter
- Subjects
- *
BIOLOGICAL systems , *DYNAMICAL systems , *BINDING sites , *PROTEIN binding , *BOOLEAN functions , *NOISE - Abstract
Biological systems are noisy by nature. This aspect is reflected in our experimental measurements and should be reflected in the models we build to better understand these systems. Noise can be especially consequential when trying to interpret specific regulatory interactions, i.e. regulatory network edges. In this paper, we propose a method to explicitly encode edge-noise in Boolean dynamical systems by probabilistic edge-weight (PEW) operators. PEW operators have two important features: first, they introduce a form of edge-weight into Boolean models through the noise, second, the noise is dependent on the dynamical state of the system, which enables more biologically meaningful modeling choices. Moreover, we offer a simple-to-use implementation in the already well-established BooleanNet framework. In two application cases, we show how the introduction of just a few PEW operators in Boolean models can fine-tune the emergent dynamics and increase the accuracy of qualitative predictions. This includes fine-tuning interactions which cause non-biological behaviors when switching between asynchronous and synchronous update schemes in dynamical simulations. Moreover, PEW operators also open the way to encode more exotic cellular dynamics, such as cellular learning, and to implementing edge-weights for regulatory networks inferred from omics data. Author summary: The life and decision-making of cells is regulated by a complex web of dynamically interacting molecules. The strength and nature of individual interactions is very diverse, and it is especially important to understand such diversity when it comes to defects and disease. For example, the mutation of a protein binding site can critically alter the probability and strength of its interactions with its binding partners. Boolean network models have become an increasingly potent tool for understanding the complex dynamical interactions within cellular regulatory systems, however, there is no straightforward and explicit way to encode weights on individual interactions. In this paper we offer a way to add weights to interactions by simple noise operators which alter the behavior of edges (or groups of edges) in in-silico simulations of Boolean network models. We show with multiple applications that adding just a few PEW (probabilistic edge-weight) operators dramatically improves the biological plausibility of Boolean models and reproduces much more nuanced experimental results. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
249. Credibility assessment of patient-specific computational modeling using patient-specific cardiac modeling as an exemplar.
- Author
-
Galappaththige, Suran, Gray, Richard A., Costa, Caroline Mendonca, Niederer, Steven, and Pathmanathan, Pras
- Subjects
- *
DIGITAL twins , *SIMULATED patients , *INDIVIDUALIZED medicine , *PRODUCT safety , *MEDICAL equipment , *DISEASE progression , *MEDICAL supplies - Abstract
Reliable and robust simulation of individual patients using patient-specific models (PSMs) is one of the next frontiers for modeling and simulation (M&S) in healthcare. PSMs, which form the basis of digital twins, can be employed as clinical tools to, for example, assess disease state, predict response to therapy, or optimize therapy. They may also be used to construct virtual cohorts of patients, for in silico evaluation of medical product safety and/or performance. Methods and frameworks have recently been proposed for evaluating the credibility of M&S in healthcare applications. However, such efforts have generally been motivated by models of medical devices or generic patient models; how best to evaluate the credibility of PSMs has largely been unexplored. The aim of this paper is to understand and demonstrate the credibility assessment process for PSMs using patient-specific cardiac electrophysiological (EP) modeling as an exemplar. We first review approaches used to generate cardiac PSMs and consider how verification, validation, and uncertainty quantification (VVUQ) apply to cardiac PSMs. Next, we execute two simulation studies using a publicly available virtual cohort of 24 patient-specific ventricular models, the first a multi-patient verification study, the second investigating the impact of uncertainty in personalized and non-personalized inputs in a virtual cohort. We then use the findings from our analyses to identify how important characteristics of PSMs can be considered when assessing credibility with the approach of the ASME V&V40 Standard, accounting for PSM concepts such as inter- and intra-user variability, multi-patient and "every-patient" error estimation, uncertainty quantification in personalized vs non-personalized inputs, clinical validation, and others. The results of this paper will be useful to developers of cardiac and other medical image based PSMs, when assessing PSM credibility. Author summary: Patient-specific models are computational models that have been personalized using data from a patient. After decades of research, recent computational, data science and healthcare advances have opened the door to the fulfilment of the enormous potential of such models, from truly personalized medicine to efficient and cost-effective testing of new medical products. However, reliability (credibility) of patient-specific models is key to their success, and there are currently no general guidelines for evaluating credibility of patient-specific models. Here, we consider how frameworks and model evaluation activities that have been developed for generic (not patient-specific) computational models, can be extended to patient specific models. We achieve this through a detailed analysis of the activities required to evaluate cardiac electrophysiological models, chosen as an exemplar field due to its maturity and the complexity of such models. This is the first paper on the topic of reliability of patient-specific models and will help pave the way to reliable and trusted patient-specific modeling across healthcare applications. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
250. Efficient Bayesian inference for stochastic agent-based models.
- Author
-
Jørgensen, Andreas Christ Sølvsten, Ghosh, Atiyo, Sturrock, Marc, and Shahrezaei, Vahid
- Subjects
- *
BAYESIAN field theory , *STOCHASTIC models , *MACHINE learning , *INFERENTIAL statistics , *INFECTIOUS disease transmission - Abstract
The modelling of many real-world problems relies on computationally heavy simulations of randomly interacting individuals or agents. However, the values of the parameters that underlie the interactions between agents are typically poorly known, and hence they need to be inferred from macroscopic observations of the system. Since statistical inference rests on repeated simulations to sample the parameter space, the high computational expense of these simulations can become a stumbling block. In this paper, we compare two ways to mitigate this issue in a Bayesian setting through the use of machine learning methods: One approach is to construct lightweight surrogate models to substitute the simulations used in inference. Alternatively, one might altogether circumvent the need for Bayesian sampling schemes and directly estimate the posterior distribution. We focus on stochastic simulations that track autonomous agents and present two case studies: tumour growths and the spread of infectious diseases. We demonstrate that good accuracy in inference can be achieved with a relatively small number of simulations, making our machine learning approaches orders of magnitude faster than classical simulation-based methods that rely on sampling the parameter space. However, we find that while some methods generally produce more robust results than others, no algorithm offers a one-size-fits-all solution when attempting to infer model parameters from observations. Instead, one must choose the inference technique with the specific real-world application in mind. The stochastic nature of the considered real-world phenomena poses an additional challenge that can become insurmountable for some approaches. Overall, we find machine learning approaches that create direct inference machines to be promising for real-world applications. We present our findings as general guidelines for modelling practitioners. Author summary: Computer simulations play a vital role in modern science as they are commonly used to compare theory with observations. One can infer the properties of a system by comparing the data to the predicted behaviour in different scenarios. Each scenario corresponds to a simulation with slightly different settings. However, since real-world problems are highly complex, the simulations often require extensive computational resources, making direct comparisons with data challenging, if not insurmountable. It is, therefore, necessary to resort to inference methods that mitigate this issue, but it is not clear-cut what path to choose for any specific research problem. In this paper, we provide general guidelines for how to make this choice. We do so by studying examples from oncology and epidemiology and by taking advantage of machine learning. More specifically, we focus on simulations that track the behaviour of autonomous agents, such as single cells or individuals. We show that the best way forward is problem-dependent and highlight the methods that yield the most robust results across the different case studies. Rather than relying on a single inference technique, we recommend employing several methods and selecting the most reliable based on predetermined criteria. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.