Back to Search Start Over

An Investigation of Information Structures in DNA

Authors :
Mohrmann, Joel
Publication Year :
2024

Abstract

The information-containing nature of the DNA molecule has been long known and observed. One technique for quantifying the relationships existing within the information contained in DNA sequences is an entity from information theory known as the average mutual information (AMI) profile. This investigation sought to use principally the AMI profile along with a few other metrics to explore the structure of the information contained in DNA sequences. Treating DNA sequences as an information source, several computational methods were employed to model their information structure. Maximum likelihood and maximum a posteriori estimators were used to predict missing bases in DNA sequences. Other novel prediction methods based upon the AMI profile and its ability to evaluate the predictability of DNA bases were also developed and tested for accuracy. The AMI profile was also adjusted to account for the triplet-code nature of DNA sequences. Additionally, machine-learning techniques such as neural networks, support vector machines, and principal component analysis were used to classify different regions of DNA sequences using the AMI profile and to compare coding versus noncoding regions. Finally, the analysis considered the relative frequency of groups of bases (known as k-mers) in DNA sequences. Arithmetic coding was explored as a way to effect the compression of DNA sequences modeled upon the relative frequency of the appearance of k-mers. It was concluded that biological information stored in DNA is complex, yet this investigation provided methods to elucidate some of the character of the information structure of DNA sequences. Advisor: Khalid Sayood

Details

Language :
English
Database :
OpenDissertations
Publication Type :
Dissertation/ Thesis
Accession number :
ddu.oai.digitalcommons.unl.edu.elecengtheses.1167