Back to Search Start Over

Discovering patterns and subfamilies in biosequences

Authors :
Brazma, A.
Jonassen, I.
Ukkonen, E.
Jaak Vilo
Source :
Europe PubMed Central, Scopus-Elsevier

Abstract

We consider the problem of automatic discovery of patterns and the corresponding subfamilies in a set of biosequences. The sequences are unaligned and may contain noise of unknown level. The patterns are of the type used in PROSITE database. In our approach we discover patterns and the respective subfamilies simultaneously. We develop a theoretically substantiated significance measure for a set of such patterns and an algorithm approximating the best pattern set and the subfamilies. The approach is based on the minimum description length (MDL) principle. We report a computing experiment correctly finding subfamilies in the family of chromo domains and revealing new strong patterns.

Details

Database :
OpenAIRE
Journal :
Europe PubMed Central, Scopus-Elsevier
Accession number :
edsair.pmid.dedup....011ae331e83bb2a6d69626d98e07e0de