Back to Search Start Over

[Regular Paper] Detection of Errors in Multi-genome Alignments Using Machine Learning Approaches

Authors :
Jaspal Singh
Ramchalam Kinattinkara Ramakrishnan
Mathieu Blanchette
Source :
2018 IEEE 18th International Conference on Bioinformatics and Bioengineering (BIBE).
Publication Year :
2018
Publisher :
IEEE, 2018.

Abstract

Whole-genome multiple alignments are widely used in genomics and evolution, and yet their accuracy is imperfect, due in part to the computational complexity of the task at hand. Identifying portions of these alignments that are likely to be incorrect would allow researchers to either work on improving them or flagging them for exclusion from downstream analyses. We introduce MSA-ED, a machine learning tool for the detection of errors in whole-genome multiple alignments. MSA-ED uses random forests or artificial neural networks to identify and classify several types of alignment errors. It is trained on labeled data obtained by using an evolution simulator to generate fake orthologous sequences and their correct alignment, and comparing it to the alignment produced by Multiz, a popular whole-genome aligner. Key to the success of MSA-ED is the engineering of several types of evolutionarily-inspired features that boost prediction accuracy. MSA-ED is shown to be able to detect certain types of errors with good accuracy. It is then applied to actual genomic alignments to identify putative alignment errors. Availability: https://github.com/jaspal1329/MSA-ED

Details

Database :
OpenAIRE
Journal :
2018 IEEE 18th International Conference on Bioinformatics and Bioengineering (BIBE)
Accession number :
edsair.doi...........2faed34e766e0d13db117d409328537e