Back to Search Start Over

Ab initio gene prediction for protein-coding regions.

Authors :
Baker L
David C
Jacobs DJ
Source :
Bioinformatics advances [Bioinform Adv] 2023 Aug 10; Vol. 3 (1), pp. vbad105. Date of Electronic Publication: 2023 Aug 10 (Print Publication: 2023).
Publication Year :
2023

Abstract

Motivation: Ab initio gene prediction in nonmodel organisms is a difficult task. While many ab initio methods have been developed, their average accuracy over long segments of a genome, and especially when assessed over a wide range of species, generally yields results with sensitivity and specificity levels in the low 60% range. A common weakness of most methods is the tendency to learn patterns that are species-specific to varying degrees. The need exists for methods to extract genetic features that can distinguish coding and noncoding regions that are not sensitive to specific organism characteristics.<br />Results: A new method based on a neural network (NN) that uses a collection of sensors to create input features is presented. It is shown that accurate predictions are achieved even when trained on organisms that are significantly different phylogenetically than test organisms. A consensus prediction algorithm for a CoDing Sequence (CDS) is subsequently applied to the first nucleotide level of NN predictions that boosts accuracy through a data-driven procedure that optimizes a CDS/non-CDS threshold. An aggregate accuracy benchmark at the nucleotide level shows that this new approach performs better than existing ab initio methods, while requiring significantly less training data.<br />Availability and Implementation: https://github.com/BioMolecularPhysicsGroup-UNCC/MachineLearning.<br />Competing Interests: None declared.<br /> (© The Author(s) 2023. Published by Oxford University Press.)

Details

Language :
English
ISSN :
2635-0041
Volume :
3
Issue :
1
Database :
MEDLINE
Journal :
Bioinformatics advances
Publication Type :
Academic Journal
Accession number :
37638212
Full Text :
https://doi.org/10.1093/bioadv/vbad105