Back to Search Start Over

Information Theory and Multivariate Techniques for Analyzing DNA Sequence Data: An Example from Tomato Genes

Authors :
Bal K Joshi
Dilip R Panthee
Source :
Nepal Journal of Biotechnology, Vol 1, Iss 1, Pp 1-8 (2011)
Publication Year :
2011
Publisher :
Biotechnology Society of Nepal, 2011.

Abstract

DNA and amino acid sequences are alphabetic symbols having no underlying metric. Use of information theory is one of the solutions for sequence metric problems. The reflection of DNA sequence complexity in phenotype stability might be useful for crop improvement. Shannon-Weaver index (Shannon Entropy, H') and mutual information (MI) index were estimated from DNA sequences of 22 genes, consisted of two gene families of tomato, namely disease resistance and fruit quality. Main objective was use of information theory and multivariate techniques to understand diversity among genes and relate the sequence complexity with phenotypes. The normalized H' value ranged from 0.429 to 0.461. The highest diversity was observed in the gene Crtr-B (beta carotene hydroxylase). Two principal components which accounted for 36.65% variation placed these genes into four groups. Groupings of these genes by both principal component and cluster analyses showed clearly the similarity at phenotypes levels within cluster. Sequences similarity among genes was observed within a family. Diversity assessment of genes applying information theory should link to understand the sequences complexity with respect to gene stability for example stability of resistance gene.

Details

Language :
English
ISSN :
20911130 and 24679313
Volume :
1
Issue :
1
Database :
Directory of Open Access Journals
Journal :
Nepal Journal of Biotechnology
Publication Type :
Academic Journal
Accession number :
edsdoj.3d2121a4558942478d10ac32f5061c1a
Document Type :
article
Full Text :
https://doi.org/10.3126/njb.v1i1.3867