11 results on '"Kindlund E"'
Search Results
2. Database of Trypanosoma cruzi repeated genes: 20 000 additional gene variants
- Author
-
Ferella Marcela, Farzana Fatima, Nilsson Daniel, Kindlund Ellen, Arner Erik, Tammi Martti T, and Andersson Björn
- Subjects
Biotechnology ,TP248.13-248.65 ,Genetics ,QH426-470 - Abstract
Abstract Background Repeats are present in all genomes, and often have important functions. However, in large genome sequencing projects, many repetitive regions remain uncharacterized. The genome of the protozoan parasite Trypanosoma cruzi consists of more than 50% repeats. These repeats include surface molecule genes, and several other gene families. In the T. cruzi genome sequencing project, it was clear that not all copies of repetitive genes were present in the assembly, due to collapse of nearly identical repeats. However, at the time of publication of the T. cruzi genome, it was not clear to what extent this had occurred. Results We have developed a pipeline to estimate the genomic repeat content, where shotgun reads are aligned to the genomic sequence and the gene copy number is estimated using the average shotgun coverage. This method was applied to the genome of T. cruzi and copy numbers of all protein coding sequences and pseudogenes were estimated. The 22 640 results were stored in a database available online. 18% of all protein coding sequences and pseudogenes were estimated to exist in 14 or more copies in the T. cruzi CL Brener genome. The average coverage of the annotated protein coding sequences and pseudogenes indicate a total gene copy number, including allelic gene variants, of over 40 000. Conclusion Our results indicate that the number of protein coding sequences and pseudogenes in the T. cruzi genome may be twice the previous estimate. We have constructed a database of the T. cruzi gene repeat data that is available as a resource to the community. The main purpose of the database is to enable biologists interested in repeated, unfinished regions to closely examine and resolve these regions themselves using all available shotgun data, instead of having to rely on annotated consensus sequences that often are erroneous and possibly misleading. Five repetitive genes were studied in more detail, in order to illustrate how the database can be used to analyze and extract information about gene repeats with different characteristics in Trypanosoma cruzi.
- Published
- 2007
- Full Text
- View/download PDF
3. DNPTrapper: an assembly editing tool for finishing and analysis of complex repeat regions
- Author
-
Kindlund Ellen, Tran Anh-Nhi, Tammi Martti T, Arner Erik, and Andersson Bjorn
- Subjects
Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background Many genome projects are left unfinished due to complex, repeated regions. Finishing is the most time consuming step in sequencing and current finishing tools are not designed with particular attention to the repeat problem. Results We have developed DNPTrapper, a shotgun sequence finishing tool, specifically designed to address the problems posed by the presence of repeated regions in the target sequence. The program detects and visualizes single base differences between nearly identical repeat copies, and offers the overview and flexibility needed to rapidly resolve complex regions within a working session. The use of a database allows large amounts of data to be stored and handled, and allows viewing of mammalian size genomes. The program is available under an Open Source license. Conclusion With DNPTrapper, it is possible to separate repeated regions that previously were considered impossible to resolve, and finishing tasks that previously took days or weeks can be resolved within hours or even minutes.
- Published
- 2006
- Full Text
- View/download PDF
4. Database of Trypanosoma cruzi repeated genes: 20,000 additional gene variants.
- Author
-
Arner E, Kindlund E, Nilsson D, Farzana F, Ferella M, Tammi MT, and Andersson B
- Subjects
- Amino Acid Sequence, Animals, Antigens, Surface genetics, Conserved Sequence, DNA, Protozoan, Gene Amplification, Gene Dosage, Genes, Protozoan physiology, Genome, Protozoan, Membrane Proteins genetics, Models, Biological, Molecular Sequence Data, Sequence Homology, Amino Acid, Databases, Genetic, Genetic Variation, Repetitive Sequences, Nucleic Acid, Trypanosoma cruzi genetics
- Abstract
Background: Repeats are present in all genomes, and often have important functions. However, in large genome sequencing projects, many repetitive regions remain uncharacterized. The genome of the protozoan parasite Trypanosoma cruzi consists of more than 50% repeats. These repeats include surface molecule genes, and several other gene families. In the T. cruzi genome sequencing project, it was clear that not all copies of repetitive genes were present in the assembly, due to collapse of nearly identical repeats. However, at the time of publication of the T. cruzi genome, it was not clear to what extent this had occurred., Results: We have developed a pipeline to estimate the genomic repeat content, where shotgun reads are aligned to the genomic sequence and the gene copy number is estimated using the average shotgun coverage. This method was applied to the genome of T. cruzi and copy numbers of all protein coding sequences and pseudogenes were estimated. The 22,640 results were stored in a database available online. 18% of all protein coding sequences and pseudogenes were estimated to exist in 14 or more copies in the T. cruzi CL Brener genome. The average coverage of the annotated protein coding sequences and pseudogenes indicate a total gene copy number, including allelic gene variants, of over 40,000., Conclusion: Our results indicate that the number of protein coding sequences and pseudogenes in the T. cruzi genome may be twice the previous estimate. We have constructed a database of the T. cruzi gene repeat data that is available as a resource to the community. The main purpose of the database is to enable biologists interested in repeated, unfinished regions to closely examine and resolve these regions themselves using all available shotgun data, instead of having to rely on annotated consensus sequences that often are erroneous and possibly misleading. Five repetitive genes were studied in more detail, in order to illustrate how the database can be used to analyze and extract information about gene repeats with different characteristics in Trypanosoma cruzi.
- Published
- 2007
- Full Text
- View/download PDF
5. GRAT--genome-scale rapid alignment tool.
- Author
-
Kindlund E, Tammi MT, Arner E, Nilsson D, and Andersson B
- Subjects
- Algorithms, Animals, Chickens, Sequence Alignment instrumentation, Sequence Analysis, DNA, Software Design
- Abstract
Modern alignment methods designed to work rapidly and efficiently with large datasets often do so at the cost of method sensitivity. To overcome this, we have developed a novel alignment program, GRAT, built to accurately align short, highly similar DNA sequences. The program runs rapidly and requires no more memory and CPU power than a desktop computer. In addition, specificity is ensured by statistically separating the true alignments from spurious matches using phred quality values. An efficient separation is especially important when searching large datasets and whenever there are repeats present in the dataset. Results are superior in comparison to widely used existing software, and analysis of two large genomic datasets show the usefulness and scalability of the algorithm.
- Published
- 2007
- Full Text
- View/download PDF
6. DNPTrapper: an assembly editing tool for finishing and analysis of complex repeat regions.
- Author
-
Arner E, Tammi MT, Tran AN, Kindlund E, and Andersson B
- Subjects
- Base Sequence, DNA analysis, DNA chemistry, Molecular Sequence Data, Algorithms, DNA genetics, Documentation methods, Repetitive Sequences, Nucleic Acid genetics, Sequence Analysis, DNA methods, Software, User-Computer Interface
- Abstract
Background: Many genome projects are left unfinished due to complex, repeated regions. Finishing is the most time consuming step in sequencing and current finishing tools are not designed with particular attention to the repeat problem., Results: We have developed DNPTrapper, a shotgun sequence finishing tool, specifically designed to address the problems posed by the presence of repeated regions in the target sequence. The program detects and visualizes single base differences between nearly identical repeat copies, and offers the overview and flexibility needed to rapidly resolve complex regions within a working session. The use of a database allows large amounts of data to be stored and handled, and allows viewing of mammalian size genomes. The program is available under an Open Source license., Conclusion: With DNPTrapper, it is possible to separate repeated regions that previously were considered impossible to resolve, and finishing tasks that previously took days or weeks can be resolved within hours or even minutes.
- Published
- 2006
- Full Text
- View/download PDF
7. The genome sequence of Trypanosoma cruzi, etiologic agent of Chagas disease.
- Author
-
El-Sayed NM, Myler PJ, Bartholomeu DC, Nilsson D, Aggarwal G, Tran AN, Ghedin E, Worthey EA, Delcher AL, Blandin G, Westenberger SJ, Caler E, Cerqueira GC, Branche C, Haas B, Anupama A, Arner E, Aslund L, Attipoe P, Bontempi E, Bringaud F, Burton P, Cadag E, Campbell DA, Carrington M, Crabtree J, Darban H, da Silveira JF, de Jong P, Edwards K, Englund PT, Fazelina G, Feldblyum T, Ferella M, Frasch AC, Gull K, Horn D, Hou L, Huang Y, Kindlund E, Klingbeil M, Kluge S, Koo H, Lacerda D, Levin MJ, Lorenzi H, Louie T, Machado CR, McCulloch R, McKenna A, Mizuno Y, Mottram JC, Nelson S, Ochaya S, Osoegawa K, Pai G, Parsons M, Pentony M, Pettersson U, Pop M, Ramirez JL, Rinta J, Robertson L, Salzberg SL, Sanchez DO, Seyler A, Sharma R, Shetty J, Simpson AJ, Sisk E, Tammi MT, Tarleton R, Teixeira S, Van Aken S, Vogt C, Ward PN, Wickstead B, Wortman J, White O, Fraser CM, Stuart KD, and Andersson B
- Subjects
- Animals, Chagas Disease drug therapy, Chagas Disease parasitology, DNA Repair, DNA Replication, DNA, Mitochondrial genetics, DNA, Protozoan genetics, Genes, Protozoan, Humans, Meiosis, Membrane Proteins chemistry, Membrane Proteins genetics, Membrane Proteins physiology, Multigene Family, Protozoan Proteins chemistry, Protozoan Proteins physiology, Recombination, Genetic, Repetitive Sequences, Nucleic Acid, Retroelements, Signal Transduction, Telomere genetics, Trypanocidal Agents pharmacology, Trypanocidal Agents therapeutic use, Trypanosoma cruzi chemistry, Trypanosoma cruzi physiology, Genome, Protozoan, Protozoan Proteins genetics, Sequence Analysis, DNA, Trypanosoma cruzi genetics
- Abstract
Whole-genome sequencing of the protozoan pathogen Trypanosoma cruzi revealed that the diploid genome contains a predicted 22,570 proteins encoded by genes, of which 12,570 represent allelic pairs. Over 50% of the genome consists of repeated sequences, such as retrotransposons and genes for large families of surface molecules, which include trans-sialidases, mucins, gp63s, and a large novel family (>1300 copies) of mucin-associated surface protein (MASP) genes. Analyses of the T. cruzi, T. brucei, and Leishmania major (Tritryp) genomes imply differences from other eukaryotes in DNA repair and initiation of replication and reflect their unusual mitochondrial DNA. Although the Tritryp lack several classes of signaling molecules, their kinomes contain a large and diverse set of protein kinases and phosphatases; their size and diversity imply previously unknown interactions and regulatory processes, which may be targets for intervention.
- Published
- 2005
- Full Text
- View/download PDF
8. A genetic variation map for chicken with 2.8 million single-nucleotide polymorphisms.
- Author
-
Wong GK, Liu B, Wang J, Zhang Y, Yang X, Zhang Z, Meng Q, Zhou J, Li D, Zhang J, Ni P, Li S, Ran L, Li H, Zhang J, Li R, Li S, Zheng H, Lin W, Li G, Wang X, Zhao W, Li J, Ye C, Dai M, Ruan J, Zhou Y, Li Y, He X, Zhang Y, Wang J, Huang X, Tong W, Chen J, Ye J, Chen C, Wei N, Li G, Dong L, Lan F, Sun Y, Zhang Z, Yang Z, Yu Y, Huang Y, He D, Xi Y, Wei D, Qi Q, Li W, Shi J, Wang M, Xie F, Wang J, Zhang X, Wang P, Zhao Y, Li N, Yang N, Dong W, Hu S, Zeng C, Zheng W, Hao B, Hillier LW, Yang SP, Warren WC, Wilson RK, Brandström M, Ellegren H, Crooijmans RP, van der Poel JJ, Bovenhuis H, Groenen MA, Ovcharenko I, Gordon L, Stubbs L, Lucas S, Glavina T, Aerts A, Kaiser P, Rothwell L, Young JR, Rogers S, Walker BA, van Hateren A, Kaufman J, Bumstead N, Lamont SJ, Zhou H, Hocking PM, Morrice D, de Koning DJ, Law A, Bartley N, Burt DW, Hunt H, Cheng HH, Gunnarsson U, Wahlberg P, Andersson L, Kindlund E, Tammi MT, Andersson B, Webber C, Ponting CP, Overton IM, Boardman PE, Tang H, Hubbard SJ, Wilson SA, Yu J, Wang J, and Yang H
- Subjects
- Alleles, Amino Acid Sequence, Animals, Animals, Domestic classification, Animals, Domestic genetics, Chickens classification, Chromosomes genetics, Female, Haplotypes genetics, Humans, Molecular Sequence Data, Ornithine Carbamoyltransferase chemistry, Selection, Genetic, Chickens genetics, Genome, Genomics, Physical Chromosome Mapping, Polymorphism, Single Nucleotide genetics
- Abstract
We describe a genetic variation map for the chicken genome containing 2.8 million single-nucleotide polymorphisms (SNPs). This map is based on a comparison of the sequences of three domestic chicken breeds (a broiler, a layer and a Chinese silkie) with that of their wild ancestor, red jungle fowl. Subsequent experiments indicate that at least 90% of the variant sites are true SNPs, and at least 70% are common SNPs that segregate in many domestic breeds. Mean nucleotide diversity is about five SNPs per kilobase for almost every possible comparison between red jungle fowl and domestic lines, between two different domestic lines, and within domestic lines--in contrast to the notion that domestic animals are highly inbred relative to their wild ancestors. In fact, most of the SNPs originated before domestication, and there is little evidence of selective sweeps for adaptive alleles on length scales greater than 100 kilobases.
- Published
- 2004
- Full Text
- View/download PDF
9. Some microsatellites may act as novel polymorphic cis-regulatory elements through transcription factor binding.
- Author
-
Iglesias AR, Kindlund E, Tammi M, and Wadelius C
- Subjects
- Base Sequence, Binding Sites genetics, Chromatography, High Pressure Liquid methods, Competitive Bidding, DNA chemistry, DNA genetics, DNA metabolism, DNA-Binding Proteins metabolism, Databases, Nucleic Acid, Electrophoretic Mobility Shift Assay, Genotype, HeLa Cells, Humans, Molecular Sequence Data, Oligonucleotides genetics, Oligonucleotides metabolism, Polymorphism, Genetic, Promoter Regions, Genetic genetics, Protein Binding, Sequence Analysis, DNA, Sequence Homology, Nucleic Acid, Sp1 Transcription Factor metabolism, Microsatellite Repeats genetics, Regulatory Sequences, Nucleic Acid genetics, Transcription Factors metabolism
- Abstract
Although microsatellites with functional effects have been described, generally, these repeats are considered as "junk" DNA in the same way as other repetitive sequences. Our aim was to investigate if certain microsatellites can have a functional role as cis-regulatory elements. A database was created of all short tandem repeats, from 2 to 10 bases, located in the first 10-kb 5' of the transcription start sites of all annotated genes of the human genome. Of 114 microsatellites selected based on their size and location in the promoter, 51 were found to be polymorphic. Using electrophoretic mobility shift assay (EMSA), we studied five repetitive motifs and three displayed specific protein binding which were found in 12 of the polymorphic microsatellites. An interesting microsatellite is the CTC/GAG repeat which, as double-stranded (DS) DNA, bound specificity protein 1 (SP1) with high affinity, formed triplexes in vitro and displayed differences in SP1 binding and triplex formation capacity for repeats with distinct numbers of repeat units. Interestingly, the polypyrimidine strand of the repeat (CTC) bound other proteins such as polypyrimidine tract-binding protein 1 (PTBP1) as single-stranded (SS) DNA, and a model with two alternative DNA conformations is proposed for these repeats. Distinct protein binding to DS DNA was also observed for different numbers of AAACA and AAAAT repeats. Our results suggest that certain microsatellites may act as cis-regulatory elements, controlling gene expression through transcription factor binding and/or secondary DNA structure formation. Due to their high polymorphism and abundance, they might represent an important source of quantitative genetic variation.
- Published
- 2004
- Full Text
- View/download PDF
10. ReDiT: Repeat Discrepancy Tagger--a shotgun assembly finishing aid.
- Author
-
Tammi MT, Arner E, Kindlund E, and Andersson B
- Subjects
- Algorithms, Base Sequence, Computer Graphics, Gene Expression Profiling, Genome, Molecular Sequence Data, Sequence Alignment methods, Word Processing methods, Chromosome Mapping methods, Documentation methods, Expressed Sequence Tags, Repetitive Sequences, Nucleic Acid genetics, Sequence Analysis, DNA methods, Software, User-Computer Interface
- Abstract
Unlabelled: Finishing, i.e. gap closure and editing, is the most time-consuming part of genome sequencing. Repeated sequences together with sequencing errors complicate the assembly and often result in misassemblies that are difficult to correct. Repeat Discrepancy Tagger (ReDiT) is a tool designed to aid in the finishing step. This software processes assembly results produced by any fragment assembly program that outputs ace files. The input sequences are analyzed to determine possible differences between repeated sequences. The output is written as tags in an ace file that can be viewed by, e.g. the Consed sequence editor., Availability: The ReDiT program is freely available at http://web.cgb.ki.se/redit
- Published
- 2004
- Full Text
- View/download PDF
11. Correcting errors in shotgun sequences.
- Author
-
Tammi MT, Arner E, Kindlund E, and Andersson B
- Subjects
- Algorithms, Genome, Repetitive Sequences, Nucleic Acid, Sequence Alignment methods, Software, Time Factors, Sequence Analysis, DNA methods
- Abstract
Sequencing errors in combination with repeated regions cause major problems in shotgun sequencing, mainly due to the failure of assembly programs to distinguish single base differences between repeat copies from erroneous base calls. In this paper, a new strategy designed to correct errors in shotgun sequence data using defined nucleotide positions, DNPs, is presented. The method distinguishes single base differences from sequencing errors by analyzing multiple alignments consisting of a read and all its overlaps with other reads. The construction of multiple alignments is performed using a novel pattern matching algorithm, which takes advantage of the symmetry between indices that can be computed for similar words of the same length. This allows for rapid construction of multiple alignments, with no previous pair-wise matching of sequence reads required. Results from a C++ implementation of this method show that up to 99% of sequencing errors can be corrected, while up to 87% of the single base differences remain and up to 80% of the corrected reads contain at most one error. The results also show that the method outperforms the error correction method used in the EULER assembler. The prototype software, MisEd, is freely available from the authors for academic use.
- Published
- 2003
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.