Back to Search Start Over

Spatial constrains and information content of sub-genomic regions of the human genome.

Authors :
Karakatsanis LP
Pavlos EG
Tsoulouhas G
Stamokostas GL
Mosbruger T
Duke JL
Pavlos GP
Monos DS
Source :
IScience [iScience] 2021 Jan 10; Vol. 24 (2), pp. 102048. Date of Electronic Publication: 2021 Jan 10 (Print Publication: 2021).
Publication Year :
2021

Abstract

Complexity metrics and machine learning (ML) models have been utilized to analyze the lengths of segmental genomic entities of DNA sequences (exonic, intronic, intergenic, repeat, unique) with the purpose to ask questions regarding the segmental organization of the human genome within the size distribution of these sequences. For this we developed an integrated methodology that is based upon the reconstructed phase space theorem, the non-extensive statistical theory of Tsallis, ML techniques, and a technical index, integrating the generated information, which we introduce and named complexity factor (COFA). Our analysis revealed that the size distribution of the genomic regions within chromosomes are not random but follow patterns with characteristic features that have been seen through its complexity character, and it is part of the dynamics of the whole genome. Finally, this picture of dynamics in DNA is recognized using ML tools for clustering, classification, and prediction with high accuracy.<br />Competing Interests: The authors declare no competing interests.<br /> (© 2021 The Authors.)

Details

Language :
English
ISSN :
2589-0042
Volume :
24
Issue :
2
Database :
MEDLINE
Journal :
IScience
Publication Type :
Academic Journal
Accession number :
33554061
Full Text :
https://doi.org/10.1016/j.isci.2021.102048