1. TOPAAS, a Tomato and Potato Assembly Assistance System for Selection and Finishing of Bacterial Artificial Chromosomes
- Author
-
Sander Peters, Taco P. Jesse, Thamara Hesselink, Dennis Woltinge, Kim Jansen, Jan C. van Haarst, Marleen H. C. Abma-Henkens, Marjo J. van Staveren, and René M. Klein-Lankhorst
- Subjects
Genetics ,Amplified Fragment Length Polymorphism Analysis ,Expressed sequence tag ,Bacterial artificial chromosome ,Contig ,Physiology ,food and beverages ,Genomics ,Plant Science ,Biology ,Genome ,Amplified fragment length polymorphism ,Synteny - Abstract
We have developed the software package Tomato and Potato Assembly Assistance System (TOPAAS), which automates the assembly and scaffolding of contig sequences for low-coverage sequencing projects. The order of contigs predicted by TOPAAS is based on read pair information; alignments between genomic, expressed sequence tags, and bacterial artificial chromosome (BAC) end sequences; and annotated genes. The contig scaffold is used by TOPAAS for automated design of nonredundant sequence gap-flanking PCR primers. We show that TOPAAS builds reliable scaffolds for tomato (Solanum lycopersicum) and potato (Solanum tuberosum) BAC contigs that were assembled from shotgun sequences covering the target at 6- to 8-fold coverage. More than 90% of the gaps are closed by sequence PCR, based on the predicted ordering information. TOPAAS also assists the selection of large genomic insert clones from BAC libraries for walking. For this, tomato BACs are screened by automated BLAST analysis and in parallel, high-density nonselective amplified fragment length polymorphism fingerprinting is used for constructing a high-resolution BAC physical map. BLAST and amplified fragment length polymorphism analysis are then used together to determine the precise overlap. Assembly onto the seed BAC consensus confirms the BACs are properly selected for having an extremely short overlap and largest extending insert. This method will be particularly applicable where related or syntenic genomes are sequenced, as shown here for the Solanaceae, and potentially useful for the monocots Brassicaceae and Leguminosea.
- Published
- 2006