1. LASAGNA: A novel algorithm for transcription factor binding site alignment
- Author
-
Chun-Hsi Huang and Chih Lee
- Subjects
Chromatin Immunoprecipitation ,Sequence analysis ,Sequence alignment ,Nucleotide Motif ,Biology ,Biochemistry ,DNA sequencing ,Mice ,03 medical and health sciences ,Structural Biology ,Animals ,Humans ,Nucleotide Motifs ,Molecular Biology ,030304 developmental biology ,0303 health sciences ,Binding Sites ,Genome ,Methodology Article ,Applied Mathematics ,030302 biochemistry & molecular biology ,DNA ,Sequence Analysis, DNA ,Computer Science Applications ,DNA metabolism ,DNA binding site ,DNA microarray ,TRANSFAC ,Sequence Alignment ,Algorithm ,Algorithms ,Transcription Factors - Abstract
Scientists routinely scan DNA sequences for transcription factor (TF) bindingsites (TFBSs). Most of the available tools rely on position-specific scoringmatrices (PSSMs) constructed from aligned binding sites. Because of theresolutions of assays used to obtain TFBSs, databases such as TRANSFAC,ORegAnno and PAZAR store unaligned variable-length DNA segments containingbinding sites of a TF. These DNA segments need to be aligned to build aPSSM. While the TRANSFAC database provides scoring matrices for TFs, nearly78% of the TFs in the public release do not have matrices available. As workon TFBS alignment algorithms has been limited, it is highly desirable tohave an alignment algorithm tailored to TFBSs. We designed a novel algorithm named LASAGNA, which is aware of the lengths ofinput TFBSs and utilizes position dependence. Results on 189 TFs of 5species in the TRANSFAC database showed that our method significantlyoutperformed ClustalW2 and MEME. We further compared a PSSM method dependenton LASAGNA to an alignment-free TFBS search method. Results on 89 TFs whosebinding sites can be located in genomes showed that our method issignificantly more precise at fixed recall rates. Finally, we describedLASAGNA-ChIP, a more sophisticated version for ChIP (Chromatinimmunoprecipitation) experiments. Under the one-per-sequence model, itshowed comparable performance with MEME in discovering motifs in ChIP-seqpeak sequences. We conclude that the LASAGNA algorithm is simple and effective in aligningvariable-length binding sites. It has been integrated into a user-friendlywebtool for TFBS search and visualization called LASAGNA-Search. The toolcurrently stores precomputed PSSM models for 189 TFs and 133 TFs built fromTFBSs in the TRANSFAC Public database (release 7.0) and the ORegAnnodatabase (08Nov10 dump), respectively. The webtool is available at http://biogrid.engr.uconn.edu/lasagna_search/ .
- Published
- 2013
- Full Text
- View/download PDF