Start Over

OperonSEQer: A set of machine-learning algorithms with threshold voting for detection of operon pairs using short-read RNA-sequencing data

Authors :: Raga Krishnakumar
Anne M. Ruffing
Source :: PLoS Computational Biology, Vol 18, Iss 1, p e1009731 (2022), PLoS Computational Biology
Publication Year :: 2022
Publisher :: Public Library of Science (PLoS), 2022.
Abstract: Operon prediction in prokaryotes is critical not only for understanding the regulation of endogenous gene expression, but also for exogenous targeting of genes using newly developed tools such as CRISPR-based gene modulation. A number of methods have used transcriptomics data to predict operons, based on the premise that contiguous genes in an operon will be expressed at similar levels. While promising results have been observed using these methods, most of them do not address uncertainty caused by technical variability between experiments, which is especially relevant when the amount of data available is small. In addition, many existing methods do not provide the flexibility to determine the stringency with which genes should be evaluated for being in an operon pair. We present OperonSEQer, a set of machine learning algorithms that uses the statistic and p-value from a non-parametric analysis of variance test (Kruskal-Wallis) to determine the likelihood that two adjacent genes are expressed from the same RNA molecule. We implement a voting system to allow users to choose the stringency of operon calls depending on whether your priority is high recall or high specificity. In addition, we provide the code so that users can retrain the algorithm and re-establish hyperparameters based on any data they choose, allowing for this method to be expanded as additional data is generated. We show that our approach detects operon pairs that are missed by current methods by comparing our predictions to publicly available long-read sequencing data. OperonSEQer therefore improves on existing methods in terms of accuracy, flexibility, and adaptability.<br />Author summary Bacteria and archaea, single-cell organisms collectively known as prokaryotes, live in all imaginable environments and comprise the majority of living organisms on this planet. Prokaryotes play a critical role in the homeostasis of multicellular organisms (such as animals and plants) and ecosystems. In addition, bacteria can be pathogenic and cause a variety of diseases in these same hosts and ecosystems. In short, understanding the biology and molecular functions of bacteria and archaea and devising mechanisms to engineer and optimize their properties are critical scientific endeavors with significant implications in healthcare, agriculture, manufacturing, and climate science among others. One major molecular difference between unicellular and multicellular organisms is the way they express genes–multicellular organisms make individual RNA molecules for each gene while, prokaryotes express operons (i.e., a group of genes coding functionally related proteins) in contiguous polycistronic RNA molecules. Understanding which genes exist within operons is critical for elucidating basic biology and for engineering organisms. In this work, we use a combination of statistical and machine learning-based methods to use next-generation sequencing data to predict operon structure across a range of prokaryotes. Our method provides an easily implemented, robust, accurate, and flexible way to determine operon structure in an organism-agnostic manner using readily available data.

Subjects :: Computer science
Operon
computer.software_genre
Biochemistry
Machine Learning
Mathematical and Statistical Techniques
Nucleic Acids
Biology (General)
RNA structure
Statistic
Statistical Data
media_common
Hyperparameter
Ecology
Applied Mathematics
Simulation and Modeling
Statistics
RNA, Bacterial
Computational Theory and Mathematics
Modeling and Simulation
Physical Sciences
Cellular Types
Algorithm
Algorithms
Research Article
Computer and Information Sciences
QH301-705.5
media_common.quotation_subject
Research and Analysis Methods
Machine learning
Adaptability
Set (abstract data type)
Machine Learning Algorithms
Cellular and Molecular Neuroscience
Artificial Intelligence
Genetics
Code (cryptography)
Statistical Methods
Operons
Molecular Biology
Ecology, Evolution, Behavior and Systematics
Flexibility (engineering)
Bacteria
Sequence Analysis, RNA
business.industry
Biology and Life Sciences
Computational Biology
DNA
Cell Biology
Expression (mathematics)
Macromolecular structure analysis
Prokaryotic Cells
RNA
Artificial intelligence
business
computer
Mathematics
Forecasting

Details

Language :: English
ISSN :: 15537358
Volume :: 18
Issue :: 1
Database :: OpenAIRE
Journal :: PLoS Computational Biology
Accession number :: edsair.doi.dedup.....7afa69cceceb53378ebb9703b84faa02

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

OperonSEQer: A set of machine-learning algorithms with threshold voting for detection of operon pairs using short-read RNA-sequencing data

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

OperonSEQer: A set of machine-learning algorithms with threshold voting for detection of operon pairs using short-read RNA-sequencing data

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources