Back to Search
Start Over
Information Theory and Multivariate Techniques for Analyzing DNA Sequence Data: An Example from Tomato Genes
- Source :
- Nepal Journal of Biotechnology, Vol 1, Iss 1, Pp 1-8 (2011)
- Publication Year :
- 2011
- Publisher :
- Biotechnology Society of Nepal, 2011.
-
Abstract
- DNA and amino acid sequences are alphabetic symbols having no underlying metric. Use of information theory is one of the solutions for sequence metric problems. The reflection of DNA sequence complexity in phenotype stability might be useful for crop improvement. Shannon-Weaver index (Shannon Entropy, H') and mutual information (MI) index were estimated from DNA sequences of 22 genes, consisted of two gene families of tomato, namely disease resistance and fruit quality. Main objective was use of information theory and multivariate techniques to understand diversity among genes and relate the sequence complexity with phenotypes. The normalized H' value ranged from 0.429 to 0.461. The highest diversity was observed in the gene Crtr-B (beta carotene hydroxylase). Two principal components which accounted for 36.65% variation placed these genes into four groups. Groupings of these genes by both principal component and cluster analyses showed clearly the similarity at phenotypes levels within cluster. Sequences similarity among genes was observed within a family. Diversity assessment of genes applying information theory should link to understand the sequences complexity with respect to gene stability for example stability of resistance gene.
Details
- Language :
- English
- ISSN :
- 20911130 and 24679313
- Volume :
- 1
- Issue :
- 1
- Database :
- Directory of Open Access Journals
- Journal :
- Nepal Journal of Biotechnology
- Publication Type :
- Academic Journal
- Accession number :
- edsdoj.3d2121a4558942478d10ac32f5061c1a
- Document Type :
- article
- Full Text :
- https://doi.org/10.3126/njb.v1i1.3867