41 results
Search Results
2. Ten quick tips for sharing open genomic data.
- Author
-
Brown, Anne V., Campbell, Jacqueline D., Assefa, Teshale, Grant, David, Nelson, Rex T., Weeks, Nathan T., and Cannon, Steven B.
- Subjects
GENOMICS ,BIOLOGICAL databases ,NUCLEOTIDE sequencing ,DATA curation ,DNA data banks - Abstract
As sequencing prices drop, genomic data accumulates—seemingly at a steadily increasing pace. Most genomic data potentially have value beyond the initial purpose—but only if shared with the scientific community. This, of course, is often easier said than done. Some of the challenges in sharing genomic data include data volume (raw file sizes and number of files), complexities, formats, nomenclatures, metadata descriptions, and the choice of a repository. In this paper, we describe 10 quick tips for sharing open genomic data. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
3. Ten simple rules for writing statistical book reviews.
- Author
-
Lortie, Christopher J.
- Subjects
BOOK reviewing ,COMPUTATIONAL biology ,STATISTICS ,COMPUTER software ,LEARNING - Abstract
Statistical books can provide deep insights into statistics and software. There are, however, many resources available to the practitioner. Book reviews have the capacity to function as a critical mechanism for the learner to assess the merits of engaging in part, in full, or at all with a book. The “ten simple rules” format, pioneered in computational biology, was applied here to writing effective book reviews for statistics because of the wide breadth of offerings in this domain, including topical introductions, computational solutions, and theory. Learning by doing is a popular paradigm in statistics and computation, but there is still a niche for books in the pedagogy of self-taught and instruction-based learning. Primarily, these rules ensure that book reviews function as a form of short syntheses to inform and guide readers in deciding to use a specific book relative to other options for resolving statistical challenges. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
4. The limitations, dangers, and benefits of simple methods for testing identifiability
- Author
-
Rob J. de Boer and Mario Castro
- Subjects
Computer and Information Sciences ,Theoretical computer science ,QH301-705.5 ,Computer science ,media_common.quotation_subject ,Social Sciences ,Pharmacokinetic Analysis ,Compartment Models ,Systems Science ,Formal Comment ,Computer Software ,Cellular and Molecular Neuroscience ,Simple (abstract algebra) ,Animal Cells ,Differential Equations ,Genetics ,Medicine and Health Sciences ,Psychology ,Simplicity ,Biology (General) ,Molecular Biology ,Scaling ,Ecology, Evolution, Behavior and Systematics ,media_common ,Pharmacology ,Neurons ,Behavior ,Ecology ,Biology and Life Sciences ,Software Engineering ,Cell Biology ,Variety (cybernetics) ,Pharmacologic Analysis ,Computational Theory and Mathematics ,Ears ,Modeling and Simulation ,Cellular Neuroscience ,Physical Sciences ,Identifiability ,Engineering and Technology ,Cellular Types ,Anatomy ,Head ,Mathematics ,Nonlinear Systems ,Neuroscience - Abstract
In their Commentary paper, Villaverde and Massonis (On testing structural identifiability by a simple scaling method: relying on scaling symmetries can be misleading) have commented on our paper in which we proposed a simple scaling method to test structural identifiability. Our scaling invariance method (SIM) tests for scaling symmetries only, and Villaverde and Massonis correctly show the SIM may fail to detect identifiability problems when a model has other types of symmetries. We agree with the limitations raised by these authors but, also, we emphasize that the method is still valuable for its applicability to a wide variety of models, its simplicity, and even as a tool to introduce the problem of identifiability to investigators with little training in mathematics.
- Published
- 2021
5. Hands-on training about overfitting
- Author
-
Janez Demšar and Blaž Zupan
- Subjects
0301 basic medicine ,Social Sciences ,Gene Expression ,02 engineering and technology ,Overfitting ,computer.software_genre ,Machine Learning ,Learning and Memory ,Sociology ,0202 electrical engineering, electronic engineering, information engineering ,ComputingMilieux_COMPUTERSANDEDUCATION ,Psychology ,Data Mining ,Biology (General) ,Data Management ,Ecology ,Orange (software) ,Software Engineering ,Toolbox ,Computational Theory and Mathematics ,Modeling and Simulation ,Lectures ,Engineering and Technology ,Workshops ,Computer and Information Sciences ,QH301-705.5 ,Machine learning ,Training (civil) ,Models, Biological ,Education ,03 medical and health sciences ,Cellular and Molecular Neuroscience ,Human Learning ,Data visualization ,Artificial Intelligence ,020204 information systems ,Genetics ,Humans ,Learning ,Molecular Biology ,Preprocessing ,Ecology, Evolution, Behavior and Systematics ,Models, Statistical ,business.industry ,Data Visualization ,Data Science ,Cognitive Psychology ,Computational Biology ,Biology and Life Sciences ,030104 developmental biology ,Workflow ,Cognitive Science ,Artificial intelligence ,business ,computer ,Software ,Neuroscience - Abstract
Overfitting is one of the critical problems in developing models by machine learning. With machine learning becoming an essential technology in computational biology, we must include training about overfitting in all courses that introduce this technology to students and practitioners. We here propose a hands-on training for overfitting that is suitable for introductory level courses and can be carried out on its own or embedded within any data science course. We use workflow-based design of machine learning pipelines, experimentation-based teaching, and hands-on approach that focuses on concepts rather than underlying mathematics. We here detail the data analysis workflows we use in training and motivate them from the viewpoint of teaching goals. Our proposed approach relies on Orange, an open-source data science toolbox that combines data visualization and machine learning, and that is tailored for education in machine learning and explorative data analysis., Author summary Every teacher strives for an a-ha moment, a sudden revelation by the student who gained a fundamental insight she will always remember. In the past years, authors of this paper have been tailoring their courses in machine learning to include material that could lead students to such discoveries. We aim to expose machine learning to practitioners–not only computer scientists but also molecular biologists and students of biomedicine, that is, the end-users of bioinformatics’ computational approaches. In this article, we lay out a course that aims to teach about overfitting, one of the key concepts in machine learning that needs to be understood, mastered, and avoided in data science applications. We propose a hands-on approach that uses an open-source workflow-based data science toolbox that combines data visualization and machine learning. In the proposed training about overfitting, we first deceive the students, then expose the problem, and finally challenge them to find the solution. In the paper, we present three lessons in overfitting and associated data analysis workflows and motivate the use of introduced computation methods by relating them to concepts conveyed by instructors.
- Published
- 2021
6. bigPint: A Bioconductor visualization package that makes big data pint-sized
- Author
-
Dianne Cook and Lindsay Rutter
- Subjects
Big Data ,0301 basic medicine ,Science and Technology Workforce ,Source code ,Computer science ,Big data ,Datasets as Topic ,Gene Expression ,Molecular biology assays and analysis techniques ,Careers in Research ,Bioconductor ,Mathematical and Statistical Techniques ,0302 clinical medicine ,Computer graphics (images) ,Cluster Analysis ,Graphics ,Biology (General) ,Pseudocode ,media_common ,Nucleic acid analysis ,Ecology ,Software Engineering ,Genomics ,Research Assessment ,RNA analysis ,Reproducibility ,Professions ,Computational Theory and Mathematics ,Modeling and Simulation ,Engineering and Technology ,Research Article ,Computer and Information Sciences ,Science Policy ,QH301-705.5 ,media_common.quotation_subject ,Research and Analysis Methods ,03 medical and health sciences ,Cellular and Molecular Neuroscience ,Data visualization ,Interactivity ,Computer Graphics ,Genetics ,Hierarchical Clustering ,Molecular Biology ,Ecology, Evolution, Behavior and Systematics ,Sequence Analysis, RNA ,Software Tools ,business.industry ,Data Visualization ,Computational Biology ,Biology and Life Sciences ,Genome Analysis ,Visualization ,Molecular biology techniques ,030104 developmental biology ,People and Places ,Scientists ,Population Groupings ,business ,Software ,030217 neurology & neurosurgery - Abstract
Interactive data visualization is imperative in the biological sciences. The development of independent layers of interactivity has been in pursuit in the visualization community. We developed bigPint, a data visualization package available on Bioconductor under the GPL-3 license (https://bioconductor.org/packages/release/bioc/html/bigPint.html). Our software introduces new visualization technology that enables independent layers of interactivity using Plotly in R, which aids in the exploration of large biological datasets. The bigPint package presents modernized versions of scatterplot matrices, volcano plots, and litre plots through the implementation of layered interactivity. These graphics have detected normalization issues, differential expression designation problems, and common analysis errors in public RNA-sequencing datasets. Researchers can apply bigPint graphics to their data by following recommended pipelines written in reproducible code in the user manual. In this paper, we explain how we achieved the independent layers of interactivity that are behind bigPint graphics. Pseudocode and source code are provided. Computational scientists can leverage our open-source code to expand upon our layered interactive technology and/or apply it in new ways toward other computational biology tasks., Author summary Biological disciplines face the challenge of increasingly large and complex data. One necessary approach toward eliciting information is data visualization. Newer visualization tools incorporate interactive capabilities that allow scientists to extract information more efficiently than static counterparts. In this paper, we introduce technology that allows multiple independent layers of interactive visualization written in open-source code. This technology can be repurposed across various biological problems. Here, we apply this technology to RNA-sequencing data, a popular next-generation sequencing approach that provides snapshots of RNA quantity in biological samples at given moments in time. It can be used to investigate cellular differences between health and disease, cellular changes in response to external stimuli, and additional biological inquiries. RNA-sequencing data is large, noisy, and biased. It requires sophisticated normalization. The most popular open-source RNA-sequencing data analysis software focuses on models, with little emphasis on integrating effective visualization tools. This is despite sound evidence that RNA-sequencing data is most effectively explored using graphical and numerical approaches in a complementary fashion. The software we introduce can make it easier for researchers to use models and visuals in an integrated fashion during RNA-sequencing data analysis.
- Published
- 2020
7. Ten simple rules for helping newcomers become contributors to open projects.
- Author
-
Sholler, Dan, Steinmacher, Igor, Ford, Denae, Averick, Mara, Hoye, Mike, and Wilson, Greg
- Subjects
SOCIAL learning ,OPEN source software ,SCIENCE education ,HEBBIAN memory ,SOFTWARE engineering ,LIFE sciences ,SCIENCE & state ,HUMAN-computer interaction - Abstract
Open-source software projects are also communities of effort. "...developers who join an organization through these programs are half as likely to transition into long-term community members than developers who do not use these programs... although developers who do succeed through these programs find them valuable". 47 Fagerholm F, Guinea AS, Münch J, Borenstein J. The role of mentoring and project characteristics for onboarding in open source software projects. [Extracted from the article]
- Published
- 2019
- Full Text
- View/download PDF
8. Assessing key decisions for transcriptomic data integration in biochemical networks.
- Author
-
Richelle, Anne, Joshi, Chintan, and Lewis, Nathan E.
- Subjects
GENE expression ,DATA integration ,GENE regulatory networks ,OVERLAY networks ,COMPUTATIONAL biology ,METABOLIC models - Abstract
To gain insights into complex biological processes, genome-scale data (e.g., RNA-Seq) are often overlaid on biochemical networks. However, many networks do not have a one-to-one relationship between genes and network edges, due to the existence of isozymes and protein complexes. Therefore, decisions must be made on how to overlay data onto networks. For example, for metabolic networks, these decisions include (1) how to integrate gene expression levels using gene-protein-reaction rules, (2) the approach used for selection of thresholds on expression data to consider the associated gene as “active”, and (3) the order in which these steps are imposed. However, the influence of these decisions has not been systematically tested. We compared 20 decision combinations using a transcriptomic dataset across 32 tissues and showed that definition of which reaction may be considered as active (i.e., reactions of the GEM with a non-zero expression level after overlaying the data) is mainly influenced by thresholding approach used. To determine the most appropriate decisions, we evaluated how these decisions impact the acquisition of tissue-specific active reaction lists that recapitulate organ-system tissue groups. These results will provide guidelines to improve data analyses with biochemical networks and facilitate the construction of context-specific metabolic models. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
9. Efficient algorithms to discover alterations with complementary functional association in cancer.
- Author
-
Sarto Basso, Rebecca, Hochbaum, Dorit S., and Vandin, Fabio
- Subjects
CANCER genetics ,PERTURBATION theory ,PHENOTYPES ,ALGORITHMS ,COMPUTATIONAL biology - Abstract
Recent large cancer studies have measured somatic alterations in an unprecedented number of tumours. These large datasets allow the identification of cancer-related sets of genetic alterations by identifying relevant combinatorial patterns. Among such patterns, mutual exclusivity has been employed by several recent methods that have shown its effectiveness in characterizing gene sets associated to cancer. Mutual exclusivity arises because of the complementarity, at the functional level, of alterations in genes which are part of a group (e.g., a pathway) performing a given function. The availability of quantitative target profiles, from genetic perturbations or from clinical phenotypes, provides additional information that can be leveraged to improve the identification of cancer related gene sets by discovering groups with complementary functional associations with such targets. In this work we study the problem of finding groups of mutually exclusive alterations associated with a quantitative (functional) target. We propose a combinatorial formulation for the problem, and prove that the associated computational problem is computationally hard. We design two algorithms to solve the problem and implement them in our tool UNCOVER. We provide analytic evidence of the effectiveness of UNCOVER in finding high-quality solutions and show experimentally that UNCOVER finds sets of alterations significantly associated with functional targets in a variety of scenarios. In particular, we show that our algorithms find sets which are better than the ones obtained by the state-of-the-art method, even when sets are evaluated using the statistical score employed by the latter. In addition, our algorithms are much faster than the state-of-the-art, allowing the analysis of large datasets of thousands of target profiles from cancer cell lines. We show that on two such datasets, one from project Achilles and one from the Genomics of Drug Sensitivity in Cancer project, UNCOVER identifies several significant gene sets with complementary functional associations with targets. Software available at: . [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
10. Securing the future of research computing in the biosciences.
- Author
-
Leng, Joanna, Shoura, Massa, McLeish, Tom C. B., Real, Alan N., Hardey, Mariann, McCafferty, James, Ranson, Neil A., and Harris, Sarah A.
- Subjects
COMPUTER systems ,LIFE sciences ,LIFE sciences research ,PHYSICS research ,ASTRONOMICAL research - Abstract
Author summary Improvements in technology often drive scientific discovery. Therefore, research requires sustained investment in the latest equipment and training for the researchers who are going to use it. Prioritising and administering infrastructure investment is challenging because future needs are difficult to predict. In the past, highly computationally demanding research was associated primarily with particle physics and astronomy experiments. However, as biology becomes more quantitative and bioscientists generate more and more data, their computational requirements may ultimately exceed those of physical scientists. Computation has always been central to bioinformatics, but now imaging experiments have rapidly growing data processing and storage requirements. There is also an urgent need for new modelling and simulation tools to provide insight and understanding of these biophysical experiments. Bioscience communities must work together to provide the software and skills training needed in their areas. Research-active institutions need to recognise that computation is now vital in many more areas of discovery and create an environment where it can be embraced. The public must also become aware of both the power and limitations of computing, particularly with respect to their health and personal data. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
11. Ten simple rules for carrying out and writing meta-analyses.
- Author
-
Forero, Diego A., Lopez-Leon, Sandra, González-Giraldo, Yeimy, and Bagos, Pantelis G.
- Subjects
META-analysis ,TECHNICAL specifications ,MATHEMATICAL variables ,EVIDENCE-based medicine ,SCIENTIFIC literature - Abstract
The author presents the ten rules for carrying out and writing meta-analysis. The rules for the development of meta-analyses that were discussed include specifying topic and type of the meta-analysis, following available guidelines for different types of meta-analyses and establishing inclusion criterion and defining key variables.
- Published
- 2019
- Full Text
- View/download PDF
12. Ten quick tips for using a Raspberry Pi.
- Author
-
Fletcher, Anthony C. and Mura, Cameron
- Subjects
RASPBERRY Pi ,COMPUTATIONAL biology ,BIOINFORMATICS ,COMPUTER science ,OPEN source software - Abstract
The article provides information on the use of Raspberry Pi, focusing on computational biology, teaching of bioinformatics and study of computer science. Topics include central processing unit (CPU) housing microcontroller, radiofrequency identification controller and free open source principle licenses.
- Published
- 2019
- Full Text
- View/download PDF
13. OpenCASA: A new open-source and scalable tool for sperm quality analysis.
- Author
-
Alquézar-Baeta, Carlos, Gimeno-Martos, Silvia, Miguel-Jiménez, Sara, Casao, Adriana, Cebrián-Pérez, José Álvaro, Muiño-Blanco, Teresa, Pérez-Pé, Rosaura, Santolaria, Pilar, Yániz, Jesús, and Palacín, Inmaculada
- Subjects
SPERM count ,REPRODUCTION ,COMPUTER software ,BLAND-Altman plot ,FERTILITY - Abstract
In the field of assisted reproductive techniques (ART), computer-assisted sperm analysis (CASA) systems have proved their utility and potential for assessing sperm quality, improving the prediction of the fertility potential of a seminal dose. Although most laboratories and scientific centers use commercial systems, in the recent years certain free and open-source alternatives have emerged that can reduce the costs that research groups have to face. However, these open-source alternatives cannot analyze sperm kinetic responses to different stimuli, such as chemotaxis, thermotaxis or rheotaxis. In addition, the programs released to date have not usually been designed to encourage the scalability and the continuity of software development. We have developed an open-source CASA software, called OpenCASA, which allows users to study three classical sperm quality parameters: motility, morphodmetry and membrane integrity (viability) and offers the possibility of analyzing the guided movement response of spermatozoa to different stimuli (useful for chemotaxis, thermotaxis or rheotaxis studies) or different motile cells such as bacteria, using a single software. This software has been released in a Version Control System at Github. This platform will allow researchers not only to download the software but also to be involved in and contribute to further developments. Additionally, a Google group has been created to allow the research community to interact and discuss OpenCASA. For validation of the OpenCASA software, we analysed different simulated sperm populations (for chemotaxis module) and evaluated 36 ejaculates obtained from 12 fertile rams using other sperm analysis systems (for motility, membrane integrity and morphology modules). The results were compared with those obtained by Open-CASA using the Pearson’s correlation and Bland-Altman tests, obtaining a high level of correlation in all parameters and a good agreement between the different used methods and the OpenCASA. With this work, we propose an open-source project oriented to the development of a new software application for sperm quality analysis. This proposed software will use a minimally centralized infrastructure to allow the continued development of its modules by the research community. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
14. STRFs in primary auditory cortex emerge from masking-based statistics of natural sounds.
- Author
-
Sheikh, Abdul-Saboor, Harper, Nicol S., Drefs, Jakob, Singer, Yosef, Dai, Zhenwen, Turner, Richard E., and Lücke, Jörg
- Subjects
AUDITORY cortex ,TEMPORAL lobe ,COCHLEA ,ALGORITHMS ,NEUROSCIENCES - Abstract
We investigate how the neural processing in auditory cortex is shaped by the statistics of natural sounds. Hypothesising that auditory cortex (A1) represents the structural primitives out of which sounds are composed, we employ a statistical model to extract such components. The input to the model are cochleagrams which approximate the non-linear transformations a sound undergoes from the outer ear, through the cochlea to the auditory nerve. Cochleagram components do not superimpose linearly, but rather according to a rule which can be approximated using the max function. This is a consequence of the compression inherent in the cochleagram and the sparsity of natural sounds. Furthermore, cochleagrams do not have negative values. Cochleagrams are therefore not matched well by the assumptions of standard linear approaches such as sparse coding or ICA. We therefore consider a new encoding approach for natural sounds, which combines a model of early auditory processing with maximal causes analysis (MCA), a sparse coding model which captures both the non-linear combination rule and non-negativity of the data. An efficient truncated EM algorithm is used to fit the MCA model to cochleagram data. We characterize the generative fields (GFs) inferred by MCA with respect to in vivo neural responses in A1 by applying reverse correlation to estimate spectro-temporal receptive fields (STRFs) implied by the learned GFs. Despite the GFs being non-negative, the STRF estimates are found to contain both positive and negative subfields, where the negative subfields can be attributed to explaining away effects as captured by the applied inference method. A direct comparison with ferret A1 shows many similar forms, and the spectral and temporal modulation tuning of both ferret and model STRFs show similar ranges over the population. In summary, our model represents an alternative to linear approaches for biological auditory encoding while it captures salient data properties and links inhibitory subfields to explaining away effects. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
15. Context-explorer: Analysis of spatially organized protein expression in high-throughput screens.
- Author
-
Ostblom, Joel, Nazareth, Emanuel J. P., Tewary, Mukul, and Zandstra, Peter W.
- Subjects
PROTEIN expression ,CELL aggregation ,PLURIPOTENT stem cells ,SOX2 protein ,ALGORITHMS - Abstract
A growing body of evidence highlights the importance of the cellular microenvironment as a regulator of phenotypic and functional cellular responses to perturbations. We have previously developed cell patterning techniques to control population context parameters, and here we demonstrate context-explorer (CE), a software tool to improve investigation cell fate acquisitions through community level analyses. We demonstrate the capabilities of CE in the analysis of human and mouse pluripotent stem cells (hPSCs, mPSCs) patterned in colonies of defined geometries in multi-well plates. CE employs a density-based clustering algorithm to identify cell colonies. Using this automatic colony classification methodology, we reach accuracies comparable to manual colony counts in a fraction of the time, both in micropatterned and unpatterned wells. Classifying cells according to their relative position within a colony enables statistical analysis of spatial organization in protein expression within colonies. When applied to colonies of hPSCs, our analysis reveals a radial gradient in the expression of the transcription factors SOX2 and OCT4. We extend these analyses to colonies of different sizes and shapes and demonstrate how the metrics derived by CE can be used to asses the patterning fidelity of micropatterned plates. We have incorporated a number of features to enhance the usability and utility of CE. To appeal to a broad scientific community, all of the software’s functionality is accessible from a graphical user interface, and convenience functions for several common data operations are included. CE is compatible with existing image analysis programs such as CellProfiler and extends the analytical capabilities already provided by these tools. Taken together, CE facilitates investigation of spatially heterogeneous cell populations for fundamental research and drug development validation programs. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
16. NFTsim: Theory and Simulation of Multiscale Neural Field Dynamics.
- Author
-
Drysdale, Peter M., Rennie, Chris J., Sanz-Leon, Paula, Robinson, Peter A., Knock, Stuart A., Zhao, Xuelong, Abeysuriya, Romesh G., and Fung, Felix K.
- Subjects
BRAIN physiology ,COMPUTER software ,COMPUTER simulation ,NEUROTRANSMITTERS - Abstract
A user ready, portable, documented software package, NFTsim, is presented to facilitate numerical simulations of a wide range of brain systems using continuum neural field modeling. NFTsim enables users to simulate key aspects of brain activity at multiple scales. At the microscopic scale, it incorporates characteristics of local interactions between cells, neurotransmitter effects, synaptodendritic delays and feedbacks. At the mesoscopic scale, it incorporates information about medium to large scale axonal ranges of fibers, which are essential to model dissipative wave transmission and to produce synchronous oscillations and associated cross-correlation patterns as observed in local field potential recordings of active tissue. At the scale of the whole brain, NFTsim allows for the inclusion of long range pathways, such as thalamocortical projections, when generating macroscopic activity fields. The multiscale nature of the neural activity produced by NFTsim has the potential to enable the modeling of resulting quantities measurable via various neuroimaging techniques. In this work, we give a comprehensive description of the design and implementation of the software. Due to its modularity and flexibility, NFTsim enables the systematic study of an unlimited number of neural systems with multiple neural populations under a unified framework and allows for direct comparison with analytic and experimental predictions. The code is written in C++ and bundled with Matlab routines for a rapid quantitative analysis and visualization of the outputs. The output of NFTsim is stored in plain text file enabling users to select from a broad range of tools for offline analysis. This software enables a wide and convenient use of powerful physiologically-based neural field approaches to brain modeling. NFTsim is distributed under the Apache 2.0 license. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
17. scPipe: A flexible R/Bioconductor preprocessing pipeline for single-cell RNA-sequencing data.
- Author
-
Tian, Luyi, Su, Shian, Dong, Xueyi, Amann-Zalcenstein, Daniela, Biben, Christine, Seidi, Azadeh, Hilton, Douglas J., Naik, Shalin H., and Ritchie, Matthew E.
- Subjects
RNA sequencing ,TRANSCRIPTOMES ,BAR codes ,DATA structures - Abstract
Single-cell RNA sequencing (scRNA-seq) technology allows researchers to profile the transcriptomes of thousands of cells simultaneously. Protocols that incorporate both designed and random barcodes have greatly increased the throughput of scRNA-seq, but give rise to a more complex data structure. There is a need for new tools that can handle the various barcoding strategies used by different protocols and exploit this information for quality assessment at the sample-level and provide effective visualization of these results in preparation for higher-level analyses. To this end, we developed scPipe, an R/Bioconductor package that integrates barcode demultiplexing, read alignment, UMI-aware gene-level quantification and quality control of raw sequencing data generated by multiple protocols that include CEL-seq, MARS-seq, Chromium 10X, Drop-seq and Smart-seq. scPipe produces a count matrix that is essential for downstream analysis along with an HTML report that summarises data quality. These results can be used as input for downstream analyses including normalization, visualization and statistical testing. scPipe performs this processing in a few simple R commands, promoting reproducible analysis of single-cell data that is compatible with the emerging suite of open-source scRNA-seq analysis tools available in R/Bioconductor and beyond. The scPipe R package is available for download from . [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
18. OpenSim: Simulating musculoskeletal dynamics and neuromuscular control to study human and animal movement.
- Author
-
Seth, Ajay, Hicks, Jennifer L., Uchida, Thomas K., Habib, Ayman, Dembia, Christopher L., Dunne, James J., Ong, Carmichael F., DeMers, Matthew S., Rajagopal, Apoorva, Millard, Matthew, Hamner, Samuel R., Arnold, Edith M., Yong, Jennifer R., Lakshmikanth, Shrinidhi K., Sherman, Michael A., Ku, Joy P., and Delp, Scott L.
- Subjects
MEDICAL innovations ,SURGICAL robots ,COMPUTER simulation ,COMPUTATIONAL complexity ,NEUROSCIENCES - Abstract
Movement is fundamental to human and animal life, emerging through interaction of complex neural, muscular, and skeletal systems. Study of movement draws from and contributes to diverse fields, including biology, neuroscience, mechanics, and robotics. OpenSim unites methods from these fields to create fast and accurate simulations of movement, enabling two fundamental tasks. First, the software can calculate variables that are difficult to measure experimentally, such as the forces generated by muscles and the stretch and recoil of tendons during movement. Second, OpenSim can predict novel movements from models of motor control, such as kinematic adaptations of human gait during loaded or inclined walking. Changes in musculoskeletal dynamics following surgery or due to human–device interaction can also be simulated; these simulations have played a vital role in several applications, including the design of implantable mechanical devices to improve human grasping in individuals with paralysis. OpenSim is an extensible and user-friendly software package built on decades of knowledge about computational modeling and simulation of biomechanical systems. OpenSim’s design enables computational scientists to create new state-of-the-art software tools and empowers others to use these tools in research and clinical applications. OpenSim supports a large and growing community of biomechanics and rehabilitation researchers, facilitating exchange of models and simulations for reproducing and extending discoveries. Examples, tutorials, documentation, and an active user forum support this community. The OpenSim software is covered by the Apache License 2.0, which permits its use for any purpose including both nonprofit and commercial applications. The source code is freely and anonymously accessible on GitHub, where the community is welcomed to make contributions. Platform-specific installers of OpenSim include a GUI and are available on simtk.org. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
19. Exploring the single-cell RNA-seq analysis landscape with the scRNA-tools database.
- Author
-
Zappia, Luke, Phipson, Belinda, and Oshlack, Alicia
- Subjects
RNA sequencing ,RNA analysis ,GENETIC databases ,MEDICAL research ,COMPUTATIONAL biology - Abstract
As single-cell RNA-sequencing (scRNA-seq) datasets have become more widespread the number of tools designed to analyse these data has dramatically increased. Navigating the vast sea of tools now available is becoming increasingly challenging for researchers. In order to better facilitate selection of appropriate analysis tools we have created the scRNA-tools database () to catalogue and curate analysis tools as they become available. Our database collects a range of information on each scRNA-seq analysis tool and categorises them according to the analysis tasks they perform. Exploration of this database gives insights into the areas of rapid development of analysis methods for scRNA-seq data. We see that many tools perform tasks specific to scRNA-seq analysis, particularly clustering and ordering of cells. We also find that the scRNA-seq community embraces an open-source and open-science approach, with most tools available under open-source licenses and preprints being extensively used as a means to describe methods. The scRNA-tools database provides a valuable resource for researchers embarking on scRNA-seq analysis and records the growth of the field over time. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
20. Tellurium notebooks—An environment for reproducible dynamical modeling in systems biology.
- Author
-
Medley, J. Kyle, Choi, Kiri, König, Matthias, Smith, Lucian, Gu, Stanley, Hellerstein, Joseph, Sealfon, Stuart C., and Sauro, Herbert M.
- Subjects
CHEMICAL elements ,TELLURIUM ,CYTOLOGY ,CELL division ,COMPUTATIONAL biology - Abstract
The considerable difficulty encountered in reproducing the results of published dynamical models limits validation, exploration and reuse of this increasingly large biomedical research resource. To address this problem, we have developed Tellurium Notebook, a software system for model authoring, simulation, and teaching that facilitates building reproducible dynamical models and reusing models by 1) providing a notebook environment which allows models, Python code, and narrative to be intermixed, 2) supporting the COMBINE archive format during model development for capturing model information in an exchangeable format and 3) enabling users to easily simulate and edit public COMBINE-compliant models from public repositories to facilitate studying model dynamics, variants and test cases. Tellurium Notebook, a Python–based Jupyter–like environment, is designed to seamlessly inter-operate with these community standards by automating conversion between COMBINE standards formulations and corresponding in–line, human–readable representations. Thus, Tellurium brings to systems biology the strategy used by other literate notebook systems such as Mathematica. These capabilities allow users to edit every aspect of the standards–compliant models and simulations, run the simulations in–line, and re–export to standard formats. We provide several use cases illustrating the advantages of our approach and how it allows development and reuse of models without requiring technical knowledge of standards. Adoption of Tellurium should accelerate model development, reproducibility and reuse. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
21. A loop-counting method for covariate-corrected low-rank biclustering of gene-expression and genome-wide association study data.
- Author
-
Rangan, Aaditya V., McGrouther, Caroline C., Kelsoe, John, Schork, Nicholas, Stahl, Eli, Zhu, Qian, Krishnan, Arjun, Yao, Vicky, Troyanskaya, Olga, Bilaloglu, Seda, Raghavan, Preeti, Bergen, Sarah, Jureus, Anders, Landen, Mikael, and null, null
- Subjects
HUMAN genome ,GENE expression ,MOLECULAR biology ,SINGLE nucleotide polymorphisms ,BIOLOGICAL evolution - Abstract
A common goal in data-analysis is to sift through a large data-matrix and detect any significant submatrices (i.e., biclusters) that have a low numerical rank. We present a simple algorithm for tackling this biclustering problem. Our algorithm accumulates information about 2-by-2 submatrices (i.e., ‘loops’) within the data-matrix, and focuses on rows and columns of the data-matrix that participate in an abundance of low-rank loops. We demonstrate, through analysis and numerical-experiments, that this loop-counting method performs well in a variety of scenarios, outperforming simple spectral methods in many situations of interest. Another important feature of our method is that it can easily be modified to account for aspects of experimental design which commonly arise in practice. For example, our algorithm can be modified to correct for controls, categorical- and continuous-covariates, as well as sparsity within the data. We demonstrate these practical features with two examples; the first drawn from gene-expression analysis and the second drawn from a much larger genome-wide-association-study (GWAS). [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
22. Porcupine: A visual pipeline tool for neuroimaging analysis.
- Author
-
van Mourik, Tim, Snoek, Lukas, Knapen, Tomas, and Norris, David G.
- Subjects
BRAIN imaging ,ACQUISITION of data ,DATA analysis ,DATA structures ,DATA modeling - Abstract
The field of neuroimaging is rapidly adopting a more reproducible approach to data acquisition and analysis. Data structures and formats are being standardised and data analyses are getting more automated. However, as data analysis becomes more complicated, researchers often have to write longer analysis scripts, spanning different tools across multiple programming languages. This makes it more difficult to share or recreate code, reducing the reproducibility of the analysis. We present a tool, Porcupine, that constructs one’s analysis visually and automatically produces analysis code. The graphical representation improves understanding of the performed analysis, while retaining the flexibility of modifying the produced code manually to custom needs. Not only does Porcupine produce the analysis code, it also creates a shareable environment for running the code in the form of a Docker image. Together, this forms a reproducible way of constructing, visualising and sharing one’s analysis. Currently, Porcupine links to Nipype functionalities, which in turn accesses most standard neuroimaging analysis tools. Our goal is to release researchers from the constraints of specific implementation details, thereby freeing them to think about novel and creative ways to solve a given problem. Porcupine improves the overview researchers have of their processing pipelines, and facilitates both the development and communication of their work. This will reduce the threshold at which less expert users can generate reusable pipelines. With Porcupine, we bridge the gap between a conceptual and an implementational level of analysis and make it easier for researchers to create reproducible and shareable science. We provide a wide range of examples and documentation, as well as installer files for all platforms on our website: . Porcupine is free, open source, and released under the GNU General Public License v3.0. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
23. beachmat: A Bioconductor C++ API for accessing high-throughput biological data from a variety of R matrix types.
- Author
-
Lun, Aaron T. L., Pagès, Hervé, and Smith, Mike L.
- Subjects
RNA sequencing ,C++ ,GENOMICS ,COMPUTATIONAL biology - Abstract
Biological experiments involving genomics or other high-throughput assays typically yield a data matrix that can be explored and analyzed using the R programming language with packages from the Bioconductor project. Improvements in the throughput of these assays have resulted in an explosion of data even from routine experiments, which poses a challenge to the existing computational infrastructure for statistical data analysis. For example, single-cell RNA sequencing (scRNA-seq) experiments frequently generate large matrices containing expression values for each gene in each cell, requiring sparse or file-backed representations for memory-efficient manipulation in R. These alternative representations are not easily compatible with high-performance C++ code used for computationally intensive tasks in existing R/Bioconductor packages. Here, we describe a C++ interface named beachmat, which enables agnostic data access from various matrix representations. This allows package developers to write efficient C++ code that is interoperable with dense, sparse and file-backed matrices, amongst others. We evaluated the performance of beachmat for accessing data from each matrix representation using both simulated and real scRNA-seq data, and defined a clear memory/speed trade-off to motivate the choice of an appropriate representation. We also demonstrate how beachmat can be incorporated into the code of other packages to drive analyses of a very large scRNA-seq data set. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
24. Fluctuating Finite Element Analysis (FFEA): A continuum mechanics software tool for mesoscale simulation of biomolecules.
- Author
-
Solernou, Albert, Hanson, Benjamin S., Richardson, Robin A., Welch, Robert, Read, Daniel J., Harlen, Oliver G., and Harris, Sarah A.
- Subjects
FINITE element method ,MACROMOLECULES ,APPLICATION software ,TOMOGRAPHY ,MESOSCALE convective complexes - Abstract
Fluctuating Finite Element Analysis (FFEA) is a software package designed to perform continuum mechanics simulations of proteins and other globular macromolecules. It combines conventional finite element methods with stochastic thermal noise, and is appropriate for simulations of large proteins and protein complexes at the mesoscale (length-scales in the range of 5 nm to 1 μm), where there is currently a paucity of modelling tools. It requires 3D volumetric information as input, which can be low resolution structural information such as cryo-electron tomography (cryo-ET) maps or much higher resolution atomistic co-ordinates from which volumetric information can be extracted. In this article we introduce our open source software package for performing FFEA simulations which we have released under a GPLv3 license. The software package includes a C ++ implementation of FFEA, together with tools to assist the user to set up the system from Electron Microscopy Data Bank (EMDB) or Protein Data Bank (PDB) data files. We also provide a PyMOL plugin to perform basic visualisation and additional Python tools for the analysis of FFEA simulation trajectories. This manuscript provides a basic background to the FFEA method, describing the implementation of the core mechanical model and how intermolecular interactions and the solvent environment are included within this framework. We provide prospective FFEA users with a practical overview of how to set up an FFEA simulation with reference to our publicly available online tutorials and manuals that accompany this first release of the package. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
25. Ten simple rules for collaborative lesson development.
- Author
-
Devenyi, Gabriel A., Emonet, Rémi, Harris, Rayna M., Hertweck, Kate L., Irving, Damien, Milligan, Ian, and Wilson, Greg
- Subjects
CURRICULUM planning ,LESSON planning - Abstract
The article offers tips on how to develop collaborative lessons.
- Published
- 2018
- Full Text
- View/download PDF
26. MAGPIE: Simplifying access and execution of computational models in the life sciences.
- Author
-
Baldow, Christoph, Salentin, Sebastian, Schroeder, Michael, Roeder, Ingo, and Glauche, Ingmar
- Subjects
LIFE sciences ,COMPUTER simulation ,SYSTEMS biology ,COMPUTATIONAL biology ,VIRTUAL machine systems - Abstract
Over the past decades, quantitative methods linking theory and observation became increasingly important in many areas of life science. Subsequently, a large number of mathematical and computational models has been developed. The BioModels database alone lists more than 140,000 Systems Biology Markup Language (SBML) models. However, while the exchange within specific models classes has been supported by standardisation and database efforts, the generic application and especially the re-use of models is still limited by practical issues such as easy and straight forward model execution. MAGPIE, a odeling and nalysis eneric latform with ntegrated valuation, closes this gap by providing a software platform for both, publishing and executing computational models without restrictions on the programming language, thereby combining a maximum on flexibility for programmers with easy handling for non-technical users. MAGPIE goes beyond classical SBML platforms by including all models, independent of the underlying programming language, ranging from simple script models to complex data integration and computations. We demonstrate the versatility of MAGPIE using four prototypic example cases. We also outline the potential of MAGPIE to improve transparency and reproducibility of computational models in life sciences. A demo server is available at magpie.imb.medizin.tu-dresden.de. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
27. pSSAlib: The partial-propensity stochastic chemical network simulator.
- Author
-
Ostrenko, Oleksandr, Incardona, Pietro, Ramaswamy, Rajesh, Brusch, Lutz, and Sbalzarini, Ivo F.
- Subjects
EUKARYOTIC cells ,HETEROKONTOPHYTA ,STOCHASTIC analysis ,MATHEMATICAL analysis ,CHEMICAL reactions - Abstract
Chemical reaction networks are ubiquitous in biology, and their dynamics is fundamentally stochastic. Here, we present the software library pSSAlib, which provides a complete and concise implementation of the most efficient partial-propensity methods for simulating exact stochastic chemical kinetics. pSSAlib can import models encoded in Systems Biology Markup Language, supports time delays in chemical reactions, and stochastic spatiotemporal reaction-diffusion systems. It also provides tools for statistical analysis of simulation results and supports multiple output formats. It has previously been used for studies of biochemical reaction pathways and to benchmark other stochastic simulation methods. Here, we describe pSSAlib in detail and apply it to a new model of the endocytic pathway in eukaryotic cells, leading to the discovery of a stochastic counterpart of the cut-out switch motif underlying early-to-late endosome conversion. pSSAlib is provided as a stand-alone command-line tool and as a developer API. We also provide a plug-in for the SBMLToolbox. The open-source code and pre-packaged installers are freely available from . [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
28. The eBioKit, a stand-alone educational platform for bioinformatics.
- Author
-
Hernández-de-Diego, Rafael, de Villiers, Etienne P., Klingström, Tomas, Gourlé, Hadrien, Conesa, Ana, and Bongcam-Rudloff, Erik
- Subjects
TEACHING aids ,BIOINFORMATICS ,EDUCATIONAL technology ,EDUCATIONAL innovations ,EDUCATIONAL resources ,EDUCATION - Abstract
Bioinformatics skills have become essential for many research areas; however, the availability of qualified researchers is usually lower than the demand and training to increase the number of able bioinformaticians is an important task for the bioinformatics community. When conducting training or hands-on tutorials, the lack of control over the analysis tools and repositories often results in undesirable situations during training, as unavailable online tools or version conflicts may delay, complicate, or even prevent the successful completion of a training event. The eBioKit is a stand-alone educational platform that hosts numerous tools and databases for bioinformatics research and allows training to take place in a controlled environment. A key advantage of the eBioKit over other existing teaching solutions is that all the required software and databases are locally installed on the system, significantly reducing the dependence on the internet. Furthermore, the architecture of the eBioKit has demonstrated itself to be an excellent balance between portability and performance, not only making the eBioKit an exceptional educational tool but also providing small research groups with a platform to incorporate bioinformatics analysis in their research. As a result, the eBioKit has formed an integral part of training and research performed by a wide variety of universities and organizations such as the Pan African Bioinformatics Network (H3ABioNet) as part of the initiative Human Heredity and Health in Africa (H3Africa), the Southern Africa Network for Biosciences (SAnBio) initiative, the Biosciences eastern and central Africa (BecA) hub, and the International Glossina Genome Initiative. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
29. OpenMM 7: Rapid development of high performance algorithms for molecular dynamics.
- Author
-
Eastman, Peter, Swails, Jason, Chodera, John D., McGibbon, Robert T., Zhao, Yutong, Beauchamp, Kyle A., Wang, Lee-Ping, Simmonett, Andrew C., Harrigan, Matthew P., Stern, Chaya D., Wiewiora, Rafal P., Brooks, Bernard R., and Pande, Vijay S.
- Subjects
MOLECULAR dynamics ,COMPUTER software ,ALGORITHMS ,COMPUTER-assisted molecular modeling ,CODING theory - Abstract
OpenMM is a molecular dynamics simulation toolkit with a unique focus on extensibility. It allows users to easily add new features, including forces with novel functional forms, new integration algorithms, and new simulation protocols. Those features automatically work on all supported hardware types (including both CPUs and GPUs) and perform well on all of them. In many cases they require minimal coding, just a mathematical description of the desired function. They also require no modification to OpenMM itself and can be distributed independently of OpenMM. This makes it an ideal tool for researchers developing new simulation methods, and also allows those new methods to be immediately available to the larger community. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
30. BIDS apps: Improving ease of use, accessibility, and reproducibility of neuroimaging data analysis methods.
- Author
-
Gorgolewski, Krzysztof J., Alfaro-Almagro, Fidel, Auer, Tibor, Bellec, Pierre, Capotă, Mihai, Chakravarty, M. Mallar, Churchill, Nathan W., Cohen, Alexander Li, Craddock, R. Cameron, Devenyi, Gabriel A., Eklund, Anders, Esteban, Oscar, Flandin, Guillaume, Ghosh, Satrajit S., Guntupalli, J. Swaroop, Jenkinson, Mark, Keshavan, Anisha, Kiar, Gregory, Liem, Franziskus, and Raamana, Pradeep Reddy
- Subjects
BRAIN imaging ,DATA structures ,HIGH performance computing ,COMPUTER operating systems ,LEGAL compliance - Abstract
The rate of progress in human neurosciences is limited by the inability to easily apply a wide range of analysis methods to the plethora of different datasets acquired in labs around the world. In this work, we introduce a framework for creating, testing, versioning and archiving portable applications for analyzing neuroimaging data organized and described in compliance with the Brain Imaging Data Structure (BIDS). The portability of these applications (BIDS Apps) is achieved by using container technologies that encapsulate all binary and other dependencies in one convenient package. BIDS Apps run on all three major operating systems with no need for complex setup and configuration and thanks to the comprehensiveness of the BIDS standard they require little manual user input. Previous containerized data processing solutions were limited to single user environments and not compatible with most multi-tenant High Performance Computing systems. BIDS Apps overcome this limitation by taking advantage of the Singularity container technology. As a proof of concept, this work is accompanied by 22 ready to use BIDS Apps, packaging a diverse set of commonly used neuroimaging algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
31. ASPASIA: A toolkit for evaluating the effects of biological interventions on SBML model behaviour.
- Author
-
Evans, Stephanie, Alden, Kieran, Cucurull-Sanchez, Lourdes, Larminie, Christopher, Coles, Mark C., Kullberg, Marika C., and Timmis, Jon
- Subjects
BEHAVIOR ,JAVA programming language ,SYSTEMS biology ,BIOINFORMATICS ,CALIBRATION ,COMPUTER software ,MATHEMATICAL models - Abstract
A calibrated computational model reflects behaviours that are expected or observed in a complex system, providing a baseline upon which sensitivity analysis techniques can be used to analyse pathways that may impact model responses. However, calibration of a model where a behaviour depends on an intervention introduced after a defined time point is difficult, as model responses may be dependent on the conditions at the time the intervention is applied. We present ASPASIA (Automated Simulation Parameter Alteration and SensItivity Analysis), a cross-platform, open-source Java toolkit that addresses a key deficiency in software tools for understanding the impact an intervention has on system behaviour for models specified in Systems Biology Markup Language (SBML). ASPASIA can generate and modify models using SBML solver output as an initial parameter set, allowing interventions to be applied once a steady state has been reached. Additionally, multiple SBML models can be generated where a subset of parameter values are perturbed using local and global sensitivity analysis techniques, revealing the model’s sensitivity to the intervention. To illustrate the capabilities of ASPASIA, we demonstrate how this tool has generated novel hypotheses regarding the mechanisms by which Th17-cell plasticity may be controlled in vivo. By using ASPASIA in conjunction with an SBML model of Th17-cell polarisation, we predict that promotion of the Th1-associated transcription factor T-bet, rather than inhibition of the Th17-associated transcription factor RORγt, is sufficient to drive switching of Th17 cells towards an IFN-γ-producing phenotype. Our approach can be applied to all SBML-encoded models to predict the effect that intervention strategies have on system behaviour. ASPASIA, released under the Artistic License (2.0), can be downloaded from . [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
32. Stochastic Simulation Service: Bridging the Gap between the Computational Expert and the Biologist.
- Author
-
Drawert, Brian, Hellander, Andreas, Bales, Ben, Banerjee, Debjani, Bellesia, Giovanni, Jr.Daigle, Bernie J., Douglas, Geoffrey, Gu, Mengyuan, Gupta, Anand, Hellander, Stefan, Horuk, Chris, Nath, Dibyendu, Takkar, Aviral, Wu, Sheng, Lötstedt, Per, Krintz, Chandra, and Petzold, Linda R.
- Subjects
STOCHASTIC systems ,BIOCHEMICAL models ,SIMULATION methods & models ,DISCRETE systems ,BIOLOGISTS - Abstract
We present StochSS: Stochastic Simulation as a Service, an integrated development environment for modeling and simulation of both deterministic and discrete stochastic biochemical systems in up to three dimensions. An easy to use graphical user interface enables researchers to quickly develop and simulate a biological model on a desktop or laptop, which can then be expanded to incorporate increasing levels of complexity. StochSS features state-of-the-art simulation engines. As the demand for computational power increases, StochSS can seamlessly scale computing resources in the cloud. In addition, StochSS can be deployed as a multi-user software environment where collaborators share computational resources and exchange models via a public model repository. We demonstrate the capabilities and ease of use of StochSS with an example of model development and simulation at increasing levels of complexity. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
33. Ten simple rules for organizing a bioinformatics training course in low- and middle-income countries
- Author
-
Patricia Carvajal-López, Piraveen Gopalasingam, Amel Ghouila, Sarah L. Morgan, Guilherme Oliveira, Verena Ras, Paballo Abel Chauke, Alice Matimba, Alejandro Reyes, Selene L. Fernandez-Valverde, Nicola Mulder, Marco Cristancho, Javier De Las Rivas, Fatma Z. Guerfali, Victoria Dominguez Del Angel, Benjamin Moore, Wellcome Trust, National Institutes of Health (US), Biotechnology and Biological Sciences Research Council (UK), Global Challenges Research Fund, Instituto de Salud Carlos III, Consejo Superior de Investigaciones Científicas (España), Universidad de Salamanca, and Instituto Nacional de Bioinformática (España)
- Subjects
Budgets ,Financial Management ,Economics ,Computer science ,Social Sciences ,Economic Geography ,Database and Informatics Methods ,Learning and Memory ,0302 clinical medicine ,Sociology ,Simple (abstract algebra) ,Psychology ,Biology (General) ,0303 health sciences ,Geography ,Ecology ,4. Education ,Software Engineering ,ComputingMilieux_GENERAL ,Professions ,Interdisciplinary Placement ,Editorial ,Computational Theory and Mathematics ,Modeling and Simulation ,Low and Middle Income Countries ,Educational Status ,Engineering and Technology ,Workshops ,Curriculum ,Human learning ,Computer and Information Sciences ,Coronavirus disease 2019 (COVID-19) ,Bioinformatics ,QH301-705.5 ,Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) ,Training course ,MEDLINE ,Research and Analysis Methods ,Education ,Computer Software ,Human Learning ,03 medical and health sciences ,Cellular and Molecular Neuroscience ,Genetics ,Learning ,Animals ,Humans ,Molecular Biology ,Developing Countries ,Ecology, Evolution, Behavior and Systematics ,030304 developmental biology ,Cognitive Psychology ,Sustainability science ,Biology and Life Sciences ,Computational Biology ,Data science ,Trainees ,Low and middle income countries ,Instructors ,People and Places ,Earth Sciences ,Cognitive Science ,Population Groupings ,Finance ,030217 neurology & neurosurgery ,Neuroscience - Abstract
© 2021 Moore et al., Bioinformatics training is required at every stage of a scientist’s research career. Continual bioinformatics training allows exposure to an ever-changing and growing repertoire of techniques and databases, and so biologists, computational scientists, and healthcare practitioners are all seeking learning opportunities in the use of computational resources and tools designed for data storage, retrieval, and analysis. There are abundant opportunities for accessing bioinformatics training for scientists in high-income countries (HICs), with well-equipped facilities and participants and trainers requiring minimal travel and financial costs alongside a range of general advice for developing short bioinformatics training courses [1–3]. However, regionally targeted bioinformatics training in low- and middle-income countries (LMICs) often requires more extensive local and external support, organization, and travel. Due to the limited expertise in bioinformatics in LMICs in general, most bioinformatics training requires a fair amount of collaboration with experts beyond the local community, country, or region. A common model of training, used as the basis of this article, includes a local host collaborating with local, regional, and international experts gathering to train local or regional participants. Recently, there has been a growth of capacity strengthening initiatives in LMICs, such as the Pan African Bioinformatics Network for Human Heredity and Health in Africa (H3ABioNet) Initiative [4–6], the Capacity Building for Bioinformatics in Latin America (CABANA) Project [7], the Asia Pacific BioInformatics Network (APBioNet) [8], and the Wellcome Connecting Science Courses and Conferences program [9]. One of the important strands of these initiatives is a drive to organize and deliver valuable bioinformatics training, but organizing and delivering short bioinformatics training workshops in an LMIC present a unique set of challenges. This paper attempts to build upon the sage advice for organizing bioinformatics workshops with specific guidance for organizing and delivering them in LMICs. It describes the processes to follow in organizing courses taking into consideration the low-resource setting. We should also note that LMICs are not a monolithic group and that setting, context, temporality, and specific location matters. LMICs are a complex regional grouping [10] and should be treated as such; however, we will present some common lessons that we hope will help organizers and trainers of bioinformatics training events in LMICs to navigate the often different, challenging, and rewarding experience., The authors who contributed to this manuscript are funded as follows: BM receives salary support from Wellcome Trust grants [WT108749/Z/15/Z, WT108749/Z/15/A], PC, VR, NM, AG’s salaries are funded in whole, or in part, by the NIH Common Fund H3ABioNet grant [U24HG006941], MC, SLFV, AR, PG, PCL’s salaries were partly funded by the UKRI-BBSRC ‘Capacity building for bioinformatics in Latin America’ (CABANA) grant, on behalf of the Global Challenges Research Fund [BB/P027849/1], JDLR is funded by ISCiii AES [ref. PI18/00591] at the CSIC/USAL (Spain) and by CYTED, RIABIO (Red Iberoamericana 521RT0118), AM’s salary is funded by [WT206194/Z/17/Z], GO is funded by the CABANA grant and SM is funded by the EMBL-EBI.
- Published
- 2021
34. Modeling of Large-Scale Functional Brain Networks Based on Structural Connectivity from DTI: Comparison with EEG Derived Phase Coupling Networks and Evaluation of Alternative Methods along the Modeling Path.
- Author
-
Finger, Holger, Bönstrup, Marlene, Cheng, Bastian, Messé, Arnaud, Hilgetag, Claus, Thomalla, Götz, Gerloff, Christian, and König, Peter
- Subjects
DIFFUSION tensor imaging ,OSCILLATING chemical reactions ,ELECTROENCEPHALOGRAPHY ,RADIOGRAPHY ,BRAIN ,SYNCHRONIC order ,LARGE-scale brain networks - Abstract
In this study, we investigate if phase-locking of fast oscillatory activity relies on the anatomical skeleton and if simple computational models informed by structural connectivity can help further to explain missing links in the structure-function relationship. We use diffusion tensor imaging data and alpha band-limited EEG signal recorded in a group of healthy individuals. Our results show that about 23.4% of the variance in empirical networks of resting-state functional connectivity is explained by the underlying white matter architecture. Simulating functional connectivity using a simple computational model based on the structural connectivity can increase the match to 45.4%. In a second step, we use our modeling framework to explore several technical alternatives along the modeling path. First, we find that an augmentation of homotopic connections in the structural connectivity matrix improves the link to functional connectivity while a correction for fiber distance slightly decreases the performance of the model. Second, a more complex computational model based on Kuramoto oscillators leads to a slight improvement of the model fit. Third, we show that the comparison of modeled and empirical functional connectivity at source level is much more specific for the underlying structural connectivity. However, different source reconstruction algorithms gave comparable results. Of note, as the fourth finding, the model fit was much better if zero-phase lag components were preserved in the empirical functional connectome, indicating a considerable amount of functionally relevant synchrony taking place with near zero or zero-phase lag. The combination of the best performing alternatives at each stage in the pipeline results in a model that explains 54.4% of the variance in the empirical EEG functional connectivity. Our study shows that large-scale brain circuits of fast neural network synchrony strongly rely upon the structural connectome and simple computational models of neural activity can explain missing links in the structure-function relationship. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
35. PhyloBot: A Web Portal for Automated Phylogenetics, Ancestral Sequence Reconstruction, and Exploration of Mutational Trajectories.
- Author
-
Hanson-Smith, Victor and Johnson, Alexander
- Subjects
WEB portals ,PHYLOGENY ,AMINO acid sequence ,PROTEIN structure ,CLOUD computing - Abstract
The method of phylogenetic ancestral sequence reconstruction is a powerful approach for studying evolutionary relationships among protein sequence, structure, and function. In particular, this approach allows investigators to (1) reconstruct and “resurrect” (that is, synthesize in vivo or in vitro) extinct proteins to study how they differ from modern proteins, (2) identify key amino acid changes that, over evolutionary timescales, have altered the function of the protein, and (3) order historical events in the evolution of protein function. Widespread use of this approach has been slow among molecular biologists, in part because the methods require significant computational expertise. Here we present PhyloBot, a web-based software tool that makes ancestral sequence reconstruction easy. Designed for non-experts, it integrates all the necessary software into a single user interface. Additionally, PhyloBot provides interactive tools to explore evolutionary trajectories between ancestors, enabling the rapid generation of hypotheses that can be tested using genetic or biochemical approaches. Early versions of this software were used in previous studies to discover genetic mechanisms underlying the functions of diverse protein families, including V-ATPase ion pumps, DNA-binding transcription regulators, and serine/threonine protein kinases. PhyloBot runs in a web browser, and is available at the following URL: . The software is implemented in Python using the Django web framework, and runs on elastic cloud computing resources from Amazon Web Services. Users can create and submit jobs on our free server (at the URL listed above), or use our open-source code to launch their own PhyloBot server. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
36. Ten Simple Rules for Curating and Facilitating Small Workshops.
- Author
-
McInerny, Greg J.
- Subjects
FORUMS ,WORKSHOPS (Facilities) -- Design & construction ,RESEARCH ,TOPIC & comment (Grammar) ,DISCUSSION ,TIME perspective ,INTERACTION model (Communication) - Abstract
The article discusses the creation of good workshops which could be in a diverse form and fit different goals. This is needed for exploring a single research topic, initiation of working group, and interdisciplinary collaborations. It requires attendees, timetables, and interactions. Topics discussed processes in facilitating workshops including assessment on past success and failures, developing of workshop name, and preparing for speakers speech.
- Published
- 2016
- Full Text
- View/download PDF
37. Computationally Efficient Implementation of a Novel Algorithm for the General Unified Threshold Model of Survival (GUTS).
- Author
-
Albert, Carlo, Vogel, Sören, and Ashauer, Roman
- Subjects
SURVIVAL analysis (Biometry) ,CALIBRATION ,TOXICOLOGY ,EPIDEMIOLOGY ,STARVATION ,ENGINEERING reliability theory - Abstract
The General Unified Threshold model of Survival (GUTS) provides a consistent mathematical framework for survival analysis. However, the calibration of GUTS models is computationally challenging. We present a novel algorithm and its fast implementation in our R package, GUTS, that help to overcome these challenges. We show a step-by-step application example consisting of model calibration and uncertainty estimation as well as making probabilistic predictions and validating the model with new data. Using self-defined wrapper functions, we show how to produce informative text printouts and plots without effort, for the inexperienced as well as the advanced user. The complete ready-to-run script is available as supplemental material. We expect that our software facilitates novel re-analysis of existing survival data as well as asking new research questions in a wide range of sciences. In particular the ability to quickly quantify stressor thresholds in conjunction with dynamic compensating processes, and their uncertainty, is an improvement that complements current survival analysis methods. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
38. Training in High-Throughput Sequencing: Common Guidelines to Enable Material Sharing, Dissemination, and Reusability.
- Author
-
Schiffthaler, Bastian, Kostadima, Myrto, null, null, Delhomme, Nicolas, and Rustici, Gabriella
- Subjects
DATA analysis ,COMPUTER software reusability ,TRAINING manuals ,TECHNICAL manuals ,LEARNING ability - Abstract
The advancement of high-throughput sequencing (HTS) technologies and the rapid development of numerous analysis algorithms and pipelines in this field has resulted in an unprecedentedly high demand for training scientists in HTS data analysis. Embarking on developing new training materials is challenging for many reasons. Trainers often do not have prior experience in preparing or delivering such materials and struggle to keep them up to date. A repository of curated HTS training materials would support trainers in materials preparation, reduce the duplication of effort by increasing the usage of existing materials, and allow for the sharing of teaching experience among the HTS trainers’ community. To achieve this, we have developed a strategy for materials’ curation and dissemination. Standards for describing training materials have been proposed and applied to the curation of existing materials. A Git repository has been set up for sharing annotated materials that can now be reused, modified, or incorporated into new courses. This repository uses Git; hence, it is decentralized and self-managed by the community and can be forked/built-upon by all users. The repository is accessible at . [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
39. netgsa: Fast computation and interactive visualization for topology-based pathway enrichment analysis
- Author
-
Kun Yue, Jing Ma, Ali Shojaie, and Michael Hellstern
- Subjects
Male ,Computer science ,Gene Expression ,Computer Architecture ,User-Computer Interface ,Software ,Microcomputers ,Graph drawing ,Breast Tumors ,Medicine and Health Sciences ,Drug Interactions ,Biology (General) ,Ecology ,Prostate Cancer ,Prostate Diseases ,Software Engineering ,Genomics ,Interaction information ,Oncology ,Computational Theory and Mathematics ,Modeling and Simulation ,Engineering and Technology ,Female ,User interface ,Network Analysis ,Research Article ,Network analysis ,Computer and Information Sciences ,QH301-705.5 ,Urology ,Breast Neoplasms ,Genome Complexity ,Topology ,Computer Software ,Cellular and Molecular Neuroscience ,Breast Cancer ,Genetics ,Humans ,Leverage (statistics) ,Molecular Biology ,Interactive visualization ,Ecology, Evolution, Behavior and Systematics ,Pharmacology ,business.industry ,Computational Biology ,Prostatic Neoplasms ,Cancers and Neoplasms ,Biology and Life Sciences ,Genitourinary Tract Tumors ,Personal computer ,business ,User Interfaces - Abstract
Existing software tools for topology-based pathway enrichment analysis are either computationally inefficient, have undesirable statistical power, or require expert knowledge to leverage the methods’ capabilities. To address these limitations, we have overhauled NetGSA, an existing topology-based method, to provide a computationally-efficient user-friendly tool that offers interactive visualization. Pathway enrichment analysis for thousands of genes can be performed in minutes on a personal computer without sacrificing statistical power. The new software also removes the need for expert knowledge by directly curating gene-gene interaction information from multiple external databases. Lastly, by utilizing the capabilities of Cytoscape, the new software also offers interactive and intuitive network visualization., Author summary With the increase in publicly available pathway topology information, topology-based pathway enrichment methods have become effective tools to analyze omics data. While many different methods are available, none are uniformly best. This paper focused on overhauling an existing topology-based method, NetGSA. The three key improvements included dramatically reduced computation time so pathway enrichment can be performed within minutes on a personal computer, integration of publicly available pathway topology databases so users can easily leverage the entire capabilities of the NetGSA method, and facilitating interactive visualization of results through an interface with Cytoscape, a popular network visualization tool. The improved NetGSA was compared to the previous version as well as other similar pathway topology-based methods and achieves competitive statistical power. With these improvements and NetGSA’s flexibility to address a diverse set of problems and data types, we believe that the new NetGSA can be a useful tool for practitioners. The updated NetGSA is available on CRAN at https://cran.r-project.org/web/packages/netgsa/index.html and the development version is available on GitHub at https://github.com/mikehellstern/netgsa.
- Published
- 2021
40. Ten simple rules for writing Dockerfiles for reproducible data science
- Author
-
Markel, Scott, Nüst, Daniel, Sochat, Vanessa, Marwick, Ben, Eglen, Stephen J., Head, Tim, Hirst, Tony, Evans, Benjamin D., Nüst, Daniel [0000-0002-0024-5046], Sochat, Vanessa [0000-0002-4387-3819], Marwick, Ben [0000-0001-7879-4531], Eglen, Stephen J [0000-0001-8607-8025], Hirst, Tony [0000-0001-6921-702X], and Apollo - University of Cambridge Repository
- Subjects
0301 basic medicine ,Computer and Information Sciences ,QH301-705.5 ,Process (engineering) ,Computer science ,Social Sciences ,Context (language use) ,Guidelines as Topic ,Research and Analysis Methods ,Scholarly communication ,Computer Software ,03 medical and health sciences ,Cellular and Molecular Neuroscience ,Habits ,0302 clinical medicine ,Software ,Genetics ,Psychology ,Biology (General) ,Molecular Biology ,Ecology, Evolution, Behavior and Systematics ,Data Management ,Metadata ,Behavior ,Ecology ,business.industry ,Software Tools ,Data Science ,Software Engineering ,Biology and Life Sciences ,Reproducibility of Results ,Research Assessment ,Transparency (behavior) ,Data science ,Reproducibility ,Source Code ,030104 developmental biology ,Workflow ,Editorial ,Computational Theory and Mathematics ,Modeling and Simulation ,Container (abstract data type) ,Engineering and Technology ,Programming Languages ,business ,030217 neurology & neurosurgery ,Algorithms - Abstract
Computational science has been greatly improved by the use of containers for packaging software and data dependencies. In a scholarly context, the main drivers for using these containers are transparency and support of reproducibility; in turn, a workflow’s reproducibility can be greatly affected by the choices that are made with respect to building containers. In many cases, the build process for the container’s image is created from instructions provided in a Dockerfile format. In support of this approach, we present a set of rules to help researchers write understandable Dockerfiles for typical data science workflows. By following the rules in this article, researchers can create containers suitable for sharing with fellow scientists, for including in scholarly communication such as education or scientific papers, and for effective and sustainable personal workflows., Author summary Computers and algorithms are ubiquitous in research. Therefore, defining the computing environment, i.e., the body of all software used directly or indirectly by a researcher, is important, because it allows other researchers to recreate the environment to understand, inspect, and reproduce an analysis. A helpful abstraction for capturing the computing environment is a container, whereby a container is created from a set of instructions in a recipe. For the most common containerisation software, Docker, this recipe is called a Dockerfile. We believe that in a scientific context, researchers should follow specific practices for writing a Dockerfile. These practices might be somewhat different from the practices of generic software developers in that researchers often need to focus on transparency and understandability rather than performance considerations. The rules presented here are intended to help researchers, especially newcomers to containerisation, leverage containers for open and effective scholarly communication and collaboration while avoiding the pitfalls that are especially irksome in a research lifecycle. The recommendations cover a deliberate approach to Dockerfile creation, formatting and style, documentation, and habits for using containers.
- Published
- 2020
41. Designing a course model for distance-based online bioinformatics training in Africa: The H3ABioNet experience.
- Author
-
Gurwitz, Kim T., Aron, Shaun, Panji, Sumir, Maslamoney, Suresh, Fernandes, Pedro L., Judge, David P., Ghouila, Amel, Domelevo Entfellner, Jean-Baka, Guerfali, Fatma Z., Saunders, Colleen, Mansour Alzohairy, Ahmed, Salifu, Samson P., Ahmed, Rehab, Cloete, Ruben, Kayondo, Jonathan, Ssemwanga, Deogratius, Mulder, Nicola, and Null, Null
- Subjects
BIOINFORMATICS ,ONLINE education ,DISTANCE education ,EDUCATION - Abstract
Africa is not unique in its need for basic bioinformatics training for individuals from a diverse range of academic backgrounds. However, particular logistical challenges in Africa, most notably access to bioinformatics expertise and internet stability, must be addressed in order to meet this need on the continent. H3ABioNet (), the Pan African Bioinformatics Network for H3Africa, has therefore developed an innovative, free-of-charge “Introduction to Bioinformatics” course, taking these challenges into account as part of its educational efforts to provide on-site training and develop local expertise inside its network. A multiple-delivery–mode learning model was selected for this 3-month course in order to increase access to (mostly) African, expert bioinformatics trainers. The content of the course was developed to include a range of fundamental bioinformatics topics at the introductory level. For the first iteration of the course (2016), classrooms with a total of 364 enrolled participants were hosted at 20 institutions across 10 African countries. To ensure that classroom success did not depend on stable internet, trainers pre-recorded their lectures, and classrooms downloaded and watched these locally during biweekly contact sessions. The trainers were available via video conferencing to take questions during contact sessions, as well as via online “question and discussion” forums outside of contact session time. This learning model, developed for a resource-limited setting, could easily be adapted to other settings. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.