544 results on '"Interspersed Repeat"'
Search Results
2. Alu insertion variants alter gene transcript levels
- Author
-
Maria S. Kryatova, Giacomo Grillo, Lindsay M. Payer, Pedro P. Rocha, Mathieu Lupien, Kathleen H. Burns, and Jared P. Steranka
- Subjects
Genetics ,Structural variation ,Reporter gene ,Regulatory sequence ,Expression quantitative trait loci ,Interspersed repeat ,Haplotype ,Alu element ,Biology ,Gene ,Genetics (clinical) - Abstract
Alu are high copy number interspersed repeats that have accumulated near genes during primate and human evolution. They are a pervasive source of structural variation in modern humans. Impacts that Alu insertions may have on gene expression are not well understood, although some have been associated with expression quantitative trait loci (eQTLs). Here, we directly test regulatory effects of polymorphic Alu insertions in isolation of other variants on the same haplotype. To screen insertion variants for those with such effects, we used ectopic luciferase reporter assays and evaluated 110 Alu insertion variants, including more than 40 with a potential role in disease risk. We observed a continuum of effects with significant outliers that up- or down-regulate luciferase activity. Using a series of reporter constructs, which included genomic context surrounding the Alu, we can distinguish between instances in which the Alu disrupts another regulator and those in which the Alu introduces new regulatory sequence. We next focused on three polymorphic Alu loci associated with breast cancer that display significant effects in the reporter assay. We used CRISPR to modify the endogenous sequences, establishing cell lines varying in the Alu genotype. Our findings indicate that Alu genotype can alter expression of genes implicated in cancer risk, including PTHLH, RANBP9, and MYC. These data show that commonly occurring polymorphic Alu elements can alter transcript levels and potentially contribute to disease risk.
- Published
- 2021
3. Analysis of pir gene expression across the Plasmodium life cycle
- Author
-
Carlos Talavera Lopez, George K. Christophides, Timothy S. Little, Caroline Hosking, Sarah I. Amis, Sarah McLaughlin, Deirdre Cunningham, John W.G. Addy, Christopher Alder, Adam J. Reid, Audrey Vandomme, and Jean Langhorne
- Subjects
Model organisms ,Subfamily ,Plasmodium berghei ,Genes, Protozoan ,Interspersed repeat ,Immunology ,RC955-962 ,Gene Expression ,Infectious Disease ,Infectious and parasitic diseases ,RC109-216 ,Biology ,Genome ,Plasmodium chabaudi ,03 medical and health sciences ,0302 clinical medicine ,Arctic medicine. Tropical medicine ,Gametocyte ,Antigenic variation ,Gene ,030304 developmental biology ,Genetics ,Life Cycle Stages ,0303 health sciences ,Research ,FOS: Clinical medicine ,biology.organism_classification ,Infectious Diseases ,Multigene Family ,Parasitology ,030217 neurology & neurosurgery - Abstract
Background Plasmodium interspersed repeat (pir) is the largest multigene family in the genomes of most Plasmodium species. A variety of functions for the PIR proteins which they encode have been proposed, including antigenic variation, immune evasion, sequestration and rosetting. However, direct evidence for these is lacking. The repetitive nature of the family has made it difficult to determine function experimentally. However, there has been some success in using gene expression studies to suggest roles for some members in virulence and chronic infection. Methods Here pir gene expression was examined across the life cycle of Plasmodium berghei using publicly available RNAseq data-sets, and at high resolution in the intraerythrocytic development cycle using new data from Plasmodium chabaudi. Results Expression of pir genes is greatest in stages of the parasite which invade and reside in red blood cells. The marked exception is that liver merozoites and male gametocytes produce a very large number of pir gene transcripts, notably compared to female gametocytes, which produce relatively few. Within the asexual blood stages different subfamilies peak at different times, suggesting further functional distinctions. Representing a subfamily of its own, the highly conserved ancestral pir gene warrants further investigation due to its potential tractability for functional investigation. It is highly transcribed in multiple life cycle stages and across most studied Plasmodium species and thus is likely to play an important role in parasite biology. Conclusions The identification of distinct expression patterns for different pir genes and subfamilies is likely to provide a basis for the design of future experiments to uncover their function.
- Published
- 2021
4. Transposable element subfamily annotation has a reproducibility problem
- Author
-
Kaitlin M. Carey, Gilia Patterson, and Travis J. Wheeler
- Subjects
Transposable element ,Subfamily ,lcsh:QH426-470 ,Research ,Segmental duplications ,Interspersed repeat ,Computational biology ,Biology ,Human genetics ,Interspersed repeats ,Annotation ,lcsh:Genetics ,Subfamilies ,Human genome ,Homologous recombination ,Transposable elements ,Molecular Biology ,Segmental duplication - Abstract
Background Transposable element (TE) sequences are classified into families based on the reconstructed history of replication, and into subfamilies based on more fine-grained features that are often intended to capture family history. We evaluate the reliability of annotation with common subfamilies by assessing the extent to which subfamily annotation is reproducible in replicate copies created by segmental duplications in the human genome, and in homologous copies shared by human and chimpanzee. Results We find that standard methods annotate over 10% of replicates as belonging to different subfamilies, despite the fact that they are expected to be annotated as belonging to the same subfamily. Point mutations and homologous recombination appear to be responsible for some of this discordant annotation (particularly in the young Alu family), but are unlikely to fully explain the annotation unreliability. Conclusions The surprisingly high level of disagreement in subfamily annotation of homologous sequences highlights a need for further research into definition of TE subfamilies, methods for representing subfamily annotation confidence of TE instances, and approaches to better utilizing such nuanced annotation data in downstream analysis.
- Published
- 2021
5. Unusually efficient CUG initiation of an overlapping reading frame in POLG mRNA yields novel protein POLGARF
- Author
-
Marina V. Serebryakova, Petr N. Datskevich, Audrey M. Michel, Ivan N. Shatsky, Maria S Mikhaylova, Patrick B F O'Connor, John F. Atkins, Pavel V. Baranov, Sergey I. Kovalchuk, Dmitri B. Papkovsky, Stephen J Kiniry, Gary Loughran, Dmitry E. Andreev, Alexander V. Zhdanov, and Fedor N. Rozov
- Subjects
Reading Frames ,Mitochondrial DNA ,Protein subunit ,Interspersed repeat ,Codon, Initiator ,Biology ,Mitochondrial Proteins ,Open Reading Frames ,Eukaryotic translation ,Pregnancy ,Animals ,Humans ,RNA, Messenger ,Gene ,Phylogeny ,Polymerase ,Genetics ,Messenger RNA ,Multidisciplinary ,Base Sequence ,Biological Sciences ,DNA Polymerase gamma ,Open reading frame ,Protein Biosynthesis ,biology.protein ,Female ,Carrier Proteins - Abstract
While near-cognate codons are frequently used for translation initiation in eukaryotes, their efficiencies are usually low (
- Published
- 2020
6. Tumor DNA hypomethylation of LINE‑1 is associated with low tumor grade of breast cancer in Tunisian patients
- Author
-
Hayet Radia Zeggar, K. Rahal, Alexandre How Kit, Olfa Adouni, Jean-François Deleuze, Hayet Douik, Maher Kharrat, Antoine Daunay, Mourad Sahbatou, Ilhem Bettaieb, and A. Gammoudi
- Subjects
0301 basic medicine ,Cancer Research ,Interspersed repeat ,Alu element ,Cancer ,Retrotransposon ,Articles ,Biology ,medicine.disease ,medicine.disease_cause ,03 medical and health sciences ,030104 developmental biology ,0302 clinical medicine ,Breast cancer ,Oncology ,030220 oncology & carcinogenesis ,DNA methylation ,Cancer research ,medicine ,Carcinogenesis ,DNA hypomethylation - Abstract
DNA hypomethylation of long interspersed repetitive DNA retrotransposon (LINE-1) and Alu repeats elements of short interspersed elements family (SINEs) is an early event in carcinogenesis that causes transcriptional activation and leads to chromosomal instability. In the current study, DNA methylation levels of LINE-1 and Alu repeats were analyzed in tumoral tissues of invasive breast cancer in a Tunisian cohort and its association with the clinicopathological features of patients was defined. DNA methylation of LINE-1 and Alu repeats were analyzed using pyrosequencing in 61 invasive breast cancers. Median values observed for DNA methylation of LINE-1 and Alu repeats were considered as the cut-off (59.81 and 18.49%, respectively). The results of the current study demonstrated a positive correlation between DNA methylation levels of LINE-1 and Alu repeats (rho=0.284; P
- Published
- 2020
7. Locus-specific chromatin profiling of evolutionarily young transposable elements
- Author
-
Darren Taylor, Robert Lowe, Claude Philippe, Kevin C. L. Cheng, Olivia A. Grant, Nicolae Radu Zabet, Gael Cristofari, Miguel R. Branco, Barts & The London School of Medicine and Dentistry [London, UK] (Blizard Institute), Queen Mary University of London (QMUL), Institut de Recherche sur le Cancer et le Vieillissement (IRCAN), Université Nice Sophia Antipolis (... - 2019) (UNS), COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA), ANR-11-LABX-0028,SIGNALIFE,Réseau d'Innovation sur les Voies de Signalisation en Sciences de la Vie(2011), ANR-16-CE12-0020,RETROMET,Rendre unique l'ADN répété ou comment révéler la régulation épigénétique des rétrotransposons L1 dans les cellules somatiques humaines à une résolution inégalée.(2016), ANR-19-CE12-0032,ImpacTE,Réseau de régulation et élément LINE-1 : impact global des éléments transposables récents sur l'activité génique chez les Mammifères(2019), Université Nice Sophia Antipolis (1965 - 2019) (UNS), Cristofari, Gael, Centres d'excellences - Réseau d'Innovation sur les Voies de Signalisation en Sciences de la Vie - - SIGNALIFE2011 - ANR-11-LABX-0028 - LABX - VALID, Rendre unique l'ADN répété ou comment révéler la régulation épigénétique des rétrotransposons L1 dans les cellules somatiques humaines à une résolution inégalée. - - RETROMET2016 - ANR-16-CE12-0020 - AAPG2016 - VALID, and Réseau de régulation et élément LINE-1 : impact global des éléments transposables récents sur l'activité génique chez les Mammifères - - ImpacTE2019 - ANR-19-CE12-0032 - AAPG2019 - VALID
- Subjects
Transposable element ,[SDV]Life Sciences [q-bio] ,Interspersed repeat ,Locus (genetics) ,[SDV.GEN] Life Sciences [q-bio]/Genetics ,Computational biology ,Biology ,Genome ,03 medical and health sciences ,Mice ,0302 clinical medicine ,[SDV.BBM.GTP]Life Sciences [q-bio]/Biochemistry, Molecular Biology/Genomics [q-bio.GN] ,Genetics ,Animals ,Humans ,Epigenetics ,Epigenomics ,030304 developmental biology ,Regulation of gene expression ,0303 health sciences ,[SDV.GEN]Life Sciences [q-bio]/Genetics ,[SDV.BIBS] Life Sciences [q-bio]/Quantitative Methods [q-bio.QM] ,Genomics ,Repetitive dna ,[SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM] ,Chromatin ,[SDV] Life Sciences [q-bio] ,Gene Expression Regulation ,Epigenetics and chromatin ,DNA Transposable Elements ,[SDV.BBM.GTP] Life Sciences [q-bio]/Biochemistry, Molecular Biology/Genomics [q-bio.GN] ,Transposable elements ,030217 neurology & neurosurgery - Abstract
Despite a vast expansion in the availability of epigenomic data, our knowledge of the chromatin landscape at interspersed repeats remains highly limited by difficulties in mapping short-read sequencing data to these regions. In particular, little is known about the locus-specific regulation of evolutionarily young transposable elements (TEs), which have been implicated in genome stability, gene regulation and innate immunity in a variety of developmental and disease contexts. Here we propose an approach for generating locus-specific protein–DNA binding profiles at interspersed repeats, which leverages information on the spatial proximity between repetitive and non-repetitive genomic regions. We demonstrate that the combination of HiChIP and a newly developed mapping tool (PAtChER) yields accurate protein enrichment profiles at individual repetitive loci. Using this approach, we reveal previously unappreciated variation in the epigenetic profiles of young TE loci in mouse and human cells. Insights gained using our method will be invaluable for dissecting the molecular determinants of TE regulation and their impact on the genome.
- Published
- 2021
8. Primer Binding Site (PBS) Profiling of Genetic Diversity of Natural Populations of Endemic Species Allium ledebourianum Schult
- Author
-
Asem Tumenbayeva, Ainur Turzhanova, O.N. Khapilina, Yuri Kotukhov, Vladislav Shevtsov, Ruslan Kalendar, Alevtina Danilova, Institute of Biotechnology, Crop Science Research Group, and Department of Agricultural Sciences
- Subjects
0106 biological sciences ,Allium ledebourianum Schult ,Rare species ,Population ,Interspersed repeat ,Biomedical Engineering ,Biodiversity ,Bioengineering ,Biology ,01 natural sciences ,Applied Microbiology and Biotechnology ,Biochemistry ,03 medical and health sciences ,chemistry.chemical_compound ,iPBS amplification ,Molecular marker ,education ,030304 developmental biology ,2. Zero hunger ,molecular marker ,0303 health sciences ,Genetic diversity ,education.field_of_study ,Small population size ,genetic diversity ,15. Life on land ,DNA profiling ,11831 Plant biology ,chemistry ,Evolutionary biology ,1181 Ecology, evolutionary biology ,TP248.13-248.65 ,010606 plant biology & botany ,Biotechnology - Abstract
Endemic species are especially vulnerable to biodiversity loss caused by isolation or habitat specificity, small population size, and anthropogenic factors. Endemic species biodiversity analysis has a critically important global value for the development of conservation strategies. The rare onion Allium ledebourianum is a narrow-lined endemic species, with natural populations located in the extreme climatic conditions of the Kazakh Altai. A. ledebourianum populations are decreasing everywhere due to anthropogenic impact, and therefore, this species requires preservation and protection. Conservation of this rare species is associated with monitoring studies to investigate the genetic diversity of natural populations. Fundamental components of eukaryote genome include multiple classes of interspersed repeats. Various PCR-based DNA fingerprinting methods are used to detect chromosomal changes related to recombination processes of these interspersed elements. These methods are based on interspersed repeat sequences and are an effective approach for assessing the biological diversity of plants and their variability. We applied DNA profiling approaches based on conservative sequences of interspersed repeats to assess the genetic diversity of natural A. ledebourianum populations located in the territory of Kazakhstan Altai. The analysis of natural A. ledebourianum populations, carried out using the DNA profiling approach, allowed the effective differentiation of the populations and assessment of their genetic diversity. We used conservative sequences of tRNA primer binding sites (PBS) of the long-terminal repeat (LTR) retrotransposons as PCR primers. Amplification using the three most effective PBS primers generated 628 PCR amplicons, with an average of 209 amplicons. The average polymorphism level varied from 34% to 40% for all studied samples. Resolution analysis of the PBS primers showed all of them to have high or medium polymorphism levels, which varied from 0.763 to 0.965. Results of the molecular analysis of variance showed that the general biodiversity of A. ledebourianum populations is due to interpopulation (67%) and intrapopulation (33%) differences. The revealed genetic diversity was higher in the most distant population of A. ledebourianum LD64, located on the Sarymsakty ridge of Southern Altai. This is the first genetic diversity study of the endemic species A. ledebourianum using DNA profiling approaches. This work allowed us to collect new genetic data on the structure of A. ledebourianum populations in the Altai for subsequent development of preservation strategies to enhance the reproduction of this relict species. The results will be useful for the conservation and exploitation of this species, serving as the basis for further studies of its evolution and ecology.
- Published
- 2021
- Full Text
- View/download PDF
9. Genome architecture and stability in the Saccharomyces cerevisiae knockout collection
- Author
-
Alexandra Selivanova, Rong Li, Roded Sharan, Shir Klein-Lavi, Mareike Herzog, Jin Zhu, Iñigo Ayestaran, Israel Salguero, Stephen P. Jackson, Siyue Wang, Martin Kupiec, Roi Meirman, Molly Gordon, Gonzalo Millán-Zambrano, and Fabio Puddu
- Subjects
Genome instability ,Whole genome sequencing ,0303 health sciences ,Mitochondrial DNA ,Multidisciplinary ,biology ,Saccharomyces cerevisiae ,Interspersed repeat ,Fungal genetics ,Computational biology ,biology.organism_classification ,Genome ,03 medical and health sciences ,0302 clinical medicine ,Gene ,030217 neurology & neurosurgery ,030304 developmental biology - Abstract
Despite major progress in defining the functional roles of genes, a complete understanding of their influences is far from being realized, even in relatively simple organisms. A major milestone in this direction arose via the completion of the yeast Saccharomyces cerevisiae gene-knockout collection (YKOC), which has enabled high-throughput reverse genetics, phenotypic screenings and analyses of synthetic-genetic interactions1–3. Ensuing experimental work has also highlighted some inconsistencies and mistakes in the YKOC, or genome instability events that rebalance the effects of specific knockouts4–6, but a complete overview of these is lacking. The identification and analysis of genes that are required for maintaining genomic stability have traditionally relied on reporter assays and on the study of deletions of individual genes, but whole-genome-sequencing technologies now enable—in principle—the direct observation of genome instability globally and at scale. To exploit this opportunity, we sequenced the whole genomes of nearly all of the 4,732 strains comprising the homozygous diploid YKOC. Here, by extracting information on copy-number variation of tandem and interspersed repetitive DNA elements, we describe—for almost every single non-essential gene—the genomic alterations that are induced by its loss. Analysis of this dataset reveals genes that affect the maintenance of various genomic elements, highlights cross-talks between nuclear and mitochondrial genome stability, and shows how strains have genetically adapted to life in the absence of individual non-essential genes. Whole-genome sequencing of the strains of the Saccharomyces cerevisiae gene-knockout collection reveals the effects of the deletion of non-essential genes on genome stability.
- Published
- 2019
10. Generic Repeat Finder: A High-Sensitivity Tool for Genome-Wide De Novo Repeat Detection
- Author
-
Chun Liang and Jieming Shi
- Subjects
0106 biological sciences ,Transposable element ,Retroelements ,Physiology ,Inverted repeat ,Interspersed repeat ,Retrotransposon ,Plant Science ,Computational biology ,01 natural sciences ,Genome ,Arabidopsis ,Genetics ,Direct repeat ,biology ,Terminal Repeat Sequences ,Computational Biology ,food and beverages ,biology.organism_classification ,Long terminal repeat ,DNA Transposable Elements ,Algorithms ,Genome, Plant ,Software ,Research Article ,010606 plant biology & botany - Abstract
Comprehensive and accurate annotation of the repeatome, including transposons, is critical for deepening our understanding of repeat origins, biogenesis, regulatory mechanisms, and roles. Here, we developed Generic Repeat Finder (GRF), a tool for genome-wide repeat detection based on fast, exhaustive numerical calculation algorithms integrated with optimized dynamic programming strategies. GRF sensitively identifies terminal inverted repeats (TIRs), terminal direct repeats (TDRs), and interspersed repeats that bear both inverted and direct repeats. GRF also detects DNA or RNA transposable elements characterized by these repeats in plant and animal genomes. For TIRs and TDRs, GRF identifies spacers in the middle and mismatches/insertions or deletions in terminal repeats, showing their alignment or base-pairing information. GRF helps improve the annotation for various DNA transposons and retrotransposons, such as miniature inverted-repeat transposable elements (MITEs), long terminal repeat (LTR) retrotransposons, and non-LTR retrotransposons, including long interspersed nuclear elements and short interspersed nuclear elements in plants. We used GRF to perform TIR/TDR, interspersed-repeat, and MITE detection in several species, including Arabidopsis (Arabidopsis thaliana), rice (Oryza sativa), and mouse (Mus musculus). As a generic bioinformatics tool in repeat finding implemented as a parallelized C++ program, GRF was faster and more sensitive than the existing inverted repeat/MITE detection tools based on numerical approaches (i.e. detectIR and detectMITE) in Arabidopsis and mouse. GRF is more sensitive than Inverted Repeat Finder in TIR detection, LTR_FINDER in short TDR detection (≤1,000 nt), and phRAIDER in interspersed repeat detection in Arabidopsis and rice. GRF is an open source available from Github.
- Published
- 2019
11. Resolving repeat families with long reads
- Author
-
Philipp Bongartz
- Subjects
Transposable element ,Computer science ,Contiguity ,Interspersed repeat ,Sequence assembly ,Repeat resolution ,Computational biology ,lcsh:Computer applications to medicine. Medical informatics ,Biochemistry ,Genome ,03 medical and health sciences ,0302 clinical medicine ,Repeat families ,Structural Biology ,Databases, Genetic ,Humans ,Molecular Biology ,lcsh:QH301-705.5 ,030304 developmental biology ,0303 health sciences ,Genome assembly ,Contig ,Applied Mathematics ,Methodology Article ,Chromosome ,Sequence Analysis, DNA ,Computer Science Applications ,lcsh:Biology (General) ,030220 oncology & carcinogenesis ,lcsh:R858-859.7 ,DNA microarray ,Algorithms - Abstract
Background Draft quality genomes for a multitude of organisms have become common due to the advancement of genome assemblers using long-read technologies with high error rates. Although current assemblies are substantially more contiguous than assemblies based on short reads, complete chromosomal assemblies are still challenging. Interspersed repeat families with multiple copy versions dominate the contig and scaffold ends of current long-read assemblies for complex genomes. These repeat families generally remain unresolved, as existing algorithmic solutions either do not scale to large copy numbers or can not handle the current high read error rates. Results We propose novel repeat resolution methods for large interspersed repeat families and assess their accuracy on simulated data sets with various distinct repeat structures and on drosophila melanogaster transposons. Additionally, we compare our methods to an existing long read repeat resolution tool and show the improved accuracy of our method. Conclusions Our results demonstrate the applicability of our methods for the improvement of the contiguity of genome assemblies. Electronic supplementary material The online version of this article (10.1186/s12859-019-2807-4) contains supplementary material, which is available to authorized users.
- Published
- 2019
12. A de novo Full-Length mRNA Transcriptome Generated From Hybrid-Corrected PacBio Long-Reads Improves the Transcript Annotation and Identifies Thousands of Novel Splice Variants in Atlantic Salmon
- Author
-
Sigmund Ramberg, Tone-Kari K Østbye, Rune Andreassen, and Bjørn Høyheim
- Subjects
0106 biological sciences ,0301 basic medicine ,Atlantic salmon ,Interspersed repeat ,Sequence assembly ,Genomics ,Computational biology ,QH426-470 ,Biology ,01 natural sciences ,Transcriptomes ,Transcriptome ,03 medical and health sciences ,hybrid error correction ,RefSeq ,Genetics ,Salmo ,Illumina dye sequencing ,Genetics (clinical) ,Original Research ,Whole genome sequencing ,PacBio Iso-seq ,Illumina sequencing ,Hybrid error corrections ,biology.organism_classification ,030104 developmental biology ,PacBio Iso-sequences ,Molecular Medicine ,full-length mRNA ,transcriptome ,Full-length mRNAs ,010606 plant biology & botany - Abstract
Atlantic salmon (Salmo salar) is a major species produced in world aquaculture and an important vertebrate model organism for studying the process of rediploidization following whole genome duplication events (Ss4R, 80 mya). The current Salmo salar transcriptome is largely generated from genome sequence based in silico predictions supported by ESTs and short-read sequencing data. However, recent progress in long-read sequencing technologies now allows for full-length transcript sequencing from single RNA-molecules. This study provides a de novo full-length mRNA transcriptome from liver, head-kidney and gill materials. A pipeline was developed based on Iso-seq sequencing of long-reads on the PacBio platform (HQ reads) followed by error-correction of the HQ reads by short-reads from the Illumina platform. The pipeline successfully processed more than 1.5 million long-reads and more than 900 million short-reads into error-corrected HQ reads. A surprisingly high percentage (32%) represented expressed interspersed repeats, while the remaining were processed into 71 461 full-length mRNAs from 23 071 loci. Each transcript was supported by several single-molecule long-read sequences and at least three short-reads, assuring a high sequence accuracy. On average, each gene was represented by three isoforms. Comparisons to the current Atlantic salmon transcripts in the RefSeq database showed that the long-read transcriptome validated 25% of all known transcripts, while the remaining full-length transcripts were novel isoforms, but few were transcripts from novel genes. A comparison to the current genome assembly indicates that the long-read transcriptome may aid in improving transcript annotation as well as provide long-read linkage information useful for improving the genome assembly. More than 80% of transcripts were assigned GO terms and thousands of transcripts were from genes or splice-variants expressed in an organ-specific manner demonstrating that hybrid error-corrected long-read transcriptomes may be applied to study genes and splice-variants expressed in certain organs or conditions (e.g., challenge materials). In conclusion, this is the single largest contribution of full-length mRNAs in Atlantic salmon. The results will be of great value to salmon genomics research, and the pipeline outlined may be applied to generate additional de novo transcriptomes in Atlantic Salmon or applied for similar projects in other species.
- Published
- 2021
- Full Text
- View/download PDF
13. SquiggleNet: real-time, direct classification of nanopore signals
- Author
-
Yuwei Bao, Torrin L. McDonald, Robert P. Dickson, Joshua D. Welch, David Blaauw, Jack Wadden, Weichen Zhou, Ryan E. Mills, Piyush Ranjan, Alan P. Boyle, and John R. Erb-Downward
- Subjects
DNA, Bacterial ,QH301-705.5 ,Interspersed repeat ,Respiratory System ,Method ,Sequence alignment ,QH426-470 ,Biology ,Genome ,Raw signal ,Deep Learning ,Classifier (linguistics) ,Genetics ,Humans ,Biology (General) ,Read-until ,business.industry ,Deep learning ,Pattern recognition ,Nanopore ,Nanopore Sequencing ,Long Interspersed Nucleotide Elements ,Oxford Nanopore ,Metagenome ,Base calling ,Nanopore sequencing ,Artificial intelligence ,business ,Real-time - Abstract
We present SquiggleNet, the first deep-learning model that can classify nanopore reads directly from their electrical signals. SquiggleNet operates faster than DNA passes through the pore, allowing real-time classification and read ejection. Using 1 s of sequencing data, the classifier achieves significantly higher accuracy than base calling followed by sequence alignment. Our approach is also faster and requires an order of magnitude less memory than alignment-based approaches. SquiggleNet distinguished human from bacterial DNA with over 90% accuracy, generalized to unseen bacterial species in a human respiratory meta genome sample, and accurately classified sequences containing human long interspersed repeat elements. Supplementary Information The online version contains supplementary material available at (10.1186/s13059-021-02511-y).
- Published
- 2021
14. DNA profiling and assessment of genetic diversity of relict species Allium altaicum Pall. on the territory of Altai
- Author
-
Olesya Raiser, Ainur Turzhanova, O.N. Khapilina, Alevtina Danilova, Vladislav Shevtsov, Ruslan Kalendar, Institute of Biotechnology, Crop Science Research Group, and Department of Agricultural Sciences
- Subjects
0106 biological sciences ,Allium altaicum ,Relict species ,Interspersed repeat ,Population ,Endangered species ,Biodiversity ,NUCLEAR ,Plant Science ,Biology ,Molecular marker ,01 natural sciences ,General Biochemistry, Genetics and Molecular Biology ,Genetic diversity ,03 medical and health sciences ,chemistry.chemical_compound ,REMAP ,REPEAT ,MARKERS ,IRAP ,Genetics ,education ,Molecular Biology ,LTR-RETROTRANSPOSONS ,030304 developmental biology ,0303 health sciences ,education.field_of_study ,Population Biology ,General Neuroscience ,SATIVUM L ,1184 Genetics, developmental biology, physiology ,General Medicine ,15. Life on land ,11831 Plant biology ,PLANT RETROTRANSPOSONS ,chemistry ,Evolutionary biology ,Threatened species ,TRANSPOSABLE ELEMENTS ,Interspersed elements ,BRAZILIAN GARLIC CULTIVARS ,Mobile genetic elements ,General Agricultural and Biological Sciences ,010606 plant biology & botany - Abstract
Analysis of the genetic diversity of natural populations of threatened and endangered species of plants is a main aspect of conservation strategy. The endangered species Allium altaicum is a relict plant of the Ice Age and natural populations are located in extreme climatic conditions of Kazakstan’s Altai Mountains. Mobile genetic elements and other interspersed repeats are basic components of a eukaryote genome, which can activate under stress conditions and indirectly promote the survival of an organism against environmental stresses. Detections of chromosomal changes related to recombination processes of mobile genetic elements are performed by various PCR methods. These methods are based on interspersed repeat sequences and are an effective tool for research of biological diversity of plants and their variability. In our research, we used conservative sequences of tRNA primer binding sites (PBS) when initializing the retrotransposon replication as PCR primers to research the genetic diversity of 12 natural populations of A. altaicum found in various ecogeographic conditions of the Kazakhstani Altai. High efficiency of the PBS amplification method used was observed already at the intrapopulation level. Unique amplicons representative of a certain population were found at the intrapopulation level. Analysis of molecular dispersion revealed that the biodiversity of populations of mountainous and lowland A. altaicum is due to intrapopulation differences for climatic zones of habitation. This is likely conditional upon predominance of vegetative reproduction over seed reproduction in some populations. In the case of vegetative reproduction, somatic recombination related to the activity of mobile genetic elements are preserved in subsequent generations. This leads to an increase of intrapopulation genetic diversity. Thus, high genetic diversity was observed in populations such as A. altaicum located in the territory of the Kalbinskii Altai, whereas the minimum diversity was observed in the populations of the Leninororsk ecogeographic group. Distinctions between these populations were also identified depending on the areas of their distribution. Low-land and mid-mountain living environments are characterized by a great variety of shapes and plasticity. This work allowed us to obtain new genetic data on the structure of A. altaicum populations on the territory of the Kazakhstan Altai for the subsequent development of preservation and reproduction strategies for this relict species.
- Published
- 2021
15. Repetitive Sequences in Sesame Genome
- Author
-
Wenchao Lin, Lei Wang, Hongmei Miao, Haiyang Zhang, and Yamin Sun
- Subjects
Genetics ,Genome evolution ,Tandem repeat ,Interspersed repeat ,Interspersed Repetitive Sequences ,Biology ,Repeated sequence ,Genome ,Long terminal repeat ,Repeat unit - Abstract
Repetitive DNA sequences are the homologous DNA fragments with multiple copies and are the main components in higher plant genomes. Reliable chromosome-scaled genome assembly provided abundant genome information regarding genome structure research in sesame. In this chapter, we present the genome structure character and distribution of interspersed repeat sequences and the tandem repeats in the assembled genome of sesame var. Yuzhi 11 (version 3.0). For S. indicum, the content of interspersed repeat sequences is 46.11%. Of the five groups of interspersed repetitive sequences, i.e., SINEs, LINEs, long terminal repeat (LTR) element, DNA element, and unknown dispersed repeat elements, LTR group is the biggest known repeat type and occupies 11.41% of the assembled genome (335 Mb). Ty1-Copia is the major superfamily in LTR-RTs. In sesame genome, distribution of the tandem repeats, such as rDNA, telomere, and centromere repeats, reflects the character of species. Cytogenetic analyses and genome structure showed that the individual 26 chromosomes in S. indicum have telomeric repeats at the terminal position. The telomeres of sesame are conserved, and the repeat sequences are same with the common repeat unit of TTTAGGG. A total of 1235 telomeric repeat of (TTTAGGG)3 were found in the assembled genome. Meanwhile, about three thousand 45S and 5S rDNA repeats are detected in the assemble genome, which accord with the rDNA content range in other plants. Distribution characters and copy number of telomeres and rDNAs reflect the relative completeness of the assembled sesame genome. In addition, distribution of simple sequence repeats (SSRs) in sesame genome also has been precisely analyzed. A 153 bp centromeric repeat sequence is identified in sesame genome for the first time, which supplies more information for exploring the genome evolution of Sesamum.
- Published
- 2021
16. Role of Transposable Elements in Gene Regulation in the Human Genome
- Author
-
Arsala Ali, Ping Liang, and Kyudong Han
- Subjects
Transposable element ,Genome evolution ,mobile elements ,Interspersed repeat ,Review ,Computational biology ,Biology ,Genome ,General Biochemistry, Genetics and Molecular Biology ,03 medical and health sciences ,0302 clinical medicine ,evolution ,microRNA ,biochemistry ,human ,lcsh:Science ,Gene ,Ecology, Evolution, Behavior and Systematics ,030304 developmental biology ,Regulation of gene expression ,0303 health sciences ,Paleontology ,food and beverages ,Space and Planetary Science ,lcsh:Q ,Human genome ,transposable elements ,gene regulation ,030217 neurology & neurosurgery - Abstract
Transposable elements (TEs), also known as mobile elements (MEs), are interspersed repeats that constitute a major fraction of the genomes of higher organisms. As one of their important functional impacts on gene function and genome evolution, TEs participate in regulating the expression of genes nearby and even far away at transcriptional and post-transcriptional levels. There are two principal ways by which TEs regulate expression of genes in the human genome. First, TEs provide cis-regulatory sequences in the genome. TEs’ intrinsic regulatory properties for their own expression make them potential factors for regulating the expression of the host genes. TE-derived cis-regulatory sites are found in promoter and enhancer elements, providing binding sites for a wide range of trans-acting factors. Second, TEs encode for regulatory RNAs. TEs sequences have been revealed to be present in a substantial fraction of miRNAs and long non-coding RNAs (lncRNAs), indicating their TE origin. Furthermore, TEs sequences were found to be critical for regulatory functions of these RNAs including binding to the target mRNA. TEs thus provide crucial regulatory roles by being part of cis-regulatory and regulatory RNA sequences. Moreover, both TE-derived cis-regulatory sequences and TE-derived regulatory RNAs, have been implicated to provide evolutionary novelty to gene regulation. These TE-derived regulatory mechanisms also tend to function in tissue-specific fashion. In this review, we aim to comprehensively cover the studies regarding these two aspects of TE-mediated gene regulation, mainly focusing on the mechanisms, contribution of different types of TEs, differential roles among tissue types, and lineage specificity, based on data mostly in humans.
- Published
- 2020
17. CGGBP1-regulated cytosine methylation at CTCF-binding motifs resists stochasticity
- Author
-
Divyesh Patel, Subhamoy Datta, Manthan Patel, and Umashankar Singh
- Subjects
0301 basic medicine ,lcsh:QH426-470 ,Interspersed repeat ,Cytosine methylation ,Biology ,Genome ,DNA sequencing ,Cytosine ,chemistry.chemical_compound ,03 medical and health sciences ,0302 clinical medicine ,Transduction, Genetic ,Transcription (biology) ,Gene expression ,Genetics ,Humans ,Allelic imbalance ,CGGBP1 ,Alleles ,Genetics (clinical) ,Transcription factor binding sites ,Stochasticity ,Binding Sites ,Chromosome Mapping ,Sequence Analysis, DNA ,Epigenome ,DNA Methylation ,CTCF ,Ctcf binding ,Cell biology ,DNA-Binding Proteins ,DNA binding site ,lcsh:Genetics ,HEK293 Cells ,030104 developmental biology ,chemistry ,DNA methylation ,DNA ,030217 neurology & neurosurgery ,Research Article - Abstract
The human CGGBP1 is implicated in a variety of cellular functions. It regulates genomic integrity, cell cycle, gene expression and cellular response to growth signals. Evidence suggests that these functions of CGGBP1 manifest through binding to GC-rich regions in the genome and regulation of interspersed repeats. Recent works show that CGGBP1 is needed for cytosine methylation homeostasis and genome-wide occupancy patterns of the epigenome regulator protein CTCF. It has remained unknown if cytosine methylation regulation and CTCF occupancy regulation by CGGBP1 are independent or interdependent processes. By sequencing immunoprecipitated methylated DNA, we have found that some transcription factor-binding sites resist stochastic changes in cytosine methylation. Of these, we have analyzed the CTCF-binding sites thoroughly and show that cytosine methylation regulation at CTCF-binding DNA sequence motifs by CGGBP1 is deterministic. These CTCF-binding sites are positioned at locations where the spread of cytosine methylation in cis depends on the levels of CGGBP1. Our findings suggest that CTCF occupancy and functions are determined by CGGBP1-regulated cytosine methylation patterns.
- Published
- 2020
18. Computer methods for visualization chromosome-specific DNA sequences in FISH images
- Author
-
Tatyana V. Karamysheva, Nikolay B. Rubtsov, and Anton Bogomolov
- Subjects
medicine.diagnostic_test ,Hybridization probe ,In silico ,Interspersed repeat ,medicine ,Chromosome ,Image processing ,Interspersed Repetitive Sequences ,Computational biology ,Biology ,DNA sequencing ,Fluorescence in situ hybridization - Abstract
A great number of interspersed repetitive sequences in chromosomes make it difficult to identify chromosomal material via fluorescence in situ hybridization (FISH). The traditional approach to solve this problem is chromosome in situ suppression hybridization (CISS-hybridization). Unfortunately, it is impossible to be performed or fails with chromosomes of many eukaryote species. The aim of this study was to consider the image enhance procedure [1] and the in silico method of chromosome specific signal visualization (method VISSIS) [2] as alternatives to CISS-hybridization. The effectiveness of approaches for identification of specific signals was estimated by signal-to-background ratio (SNR). The computer methods were applied to images of human chromosomes, obtained with FISH of the whole chromosome painting DNA probes. Results showed that effectiveness of image processing methods depends on ratio of short and line interspersed elements (SINEs/LINEs) in DNA probes. The closer chromosomes in ratio of SINEs/LINEs, the higher specific signal intensities and signal-to-background ratios could be achieved. This suggests that computer methods can be efficient only with application of DNA probes derived from chromosomes characterized with similar ratio of SINE and LINE contents.
- Published
- 2020
19. Chromosome level assembly reveals a unique immune gene organization and signatures of evolution in the common pheasant
- Author
-
Xiao Lu, Lingxiao Luo, Jinmei Ding, He Meng, Chuan He, Hongyan Yuan, Lele Zhao, Lu Xuelin, Huaixi Luo, Fisayo T. Akinyemi, Ke Xu, Han Chengxiao, Lingyu Yang, Hao Zhou, and Zheng Yuming
- Subjects
0106 biological sciences ,0301 basic medicine ,Interspersed repeat ,010603 evolutionary biology ,01 natural sciences ,Genome ,Chromosomes ,Evolution, Molecular ,Major Histocompatibility Complex ,03 medical and health sciences ,Genetics ,Gene family ,Animals ,Galliformes ,Ecology, Evolution, Behavior and Systematics ,Phylogeny ,Comparative genomics ,Phylogenetic tree ,biology ,food and beverages ,biology.organism_classification ,030104 developmental biology ,Evolutionary biology ,Multigene Family ,Microchromosome ,Common pheasant ,Phasianus ,Biotechnology - Abstract
The common pheasant Phasianus colchicus, belonging to the order Galliformes and family Phasianidae, is the most widespread species. Despite a long history of captivity, the domestication of this bird is still at a preliminary stage. Recently, the demand for accelerating its transformation to poultry for meat and egg production has been increasing. In this study, we assembled high quality, chromosome scale genome of the common pheasant by using PacBio long reads, next-generation short reads, and Hi-C technology. The primary assembly has contig N50 size of 1.33 Mb and scaffold N50 size of 59.46 Mb, with a total size of 0.99 Gb, resolving most macrochromosomes into single scaffolds. A total of 23,058 genes and 10.71 Mb interspersed repeats were identified, constituting 30.31% and 10.71% of the common pheasant genome, respectively. Our phylogenetic analysis revealed that the common pheasant shared common ancestors with turkey about 24.7-34.5 million years ago (Ma). Rapidly evolved gene families, as well as branch-specific positively selected genes, indicate that calcium-related genes are potentially related to the adaptive and evolutionary change of the common pheasant. Interestingly, we found that the common pheasant has a unique major histocompatibility complex B locus (MHC-B) structure: three major inversions occurred in the sequence compared with chicken MHC-B. Furthermore, we detected signals of selection in five breeds of domestic common pheasant, several of which are production-oriented.
- Published
- 2020
20. LINE-1 specific nuclear organization in mice olfactory sensory neurons
- Author
-
Leonardo Fontoura Ormundo, Cleiton F. Machado, Erika Demasceno Sakamoto, Viviane Simões, and Lucia M. Armelin-Correa
- Subjects
0301 basic medicine ,Male ,Heterochromatin ,Interspersed repeat ,Retrotransposon ,Biology ,Receptors, Odorant ,Olfactory Receptor Neurons ,Histones ,03 medical and health sciences ,Cellular and Molecular Neuroscience ,0302 clinical medicine ,Olfactory Mucosa ,medicine ,Constitutive heterochromatin ,Animals ,Molecular Biology ,Cell Nucleus ,Olfactory receptor ,Neurogenesis ,Cell Biology ,Cell biology ,Mice, Inbred C57BL ,030104 developmental biology ,medicine.anatomical_structure ,Long Interspersed Nucleotide Elements ,Gene Expression Regulation ,Human genome ,Olfactory epithelium ,030217 neurology & neurosurgery - Abstract
Long interspersed nuclear elements-1 (LINE-1) are mobile DNA elements that comprise the majority of interspersed repeats in the mammalian genome. During the last decade, these transposable sequences have been described as controlling elements involved in transcriptional regulation and genome plasticity. Recently, LINE-1 have been implicated in neurogenesis, but to date little is known about their nuclear organization in neurons. The olfactory epithelium is a site of continuous neurogenesis, and loci of olfactory receptor genes are enriched in LINE-1 copies. Olfactory neurons have a unique inverted nuclear architecture and constitutive heterochromatin forms a block in the center of the nuclei. Our DNA FISH images show that, even though LINE-1 copies are dispersed throughout the mice genome, they are clustered forming a cap around the central heterochromatin block and frequently occupy the same position as facultative heterochromatin in olfactory neurons nuclei. This specific LINE-1 organization could not be observed in other olfactory epithelium cell types. Analyses of H3K27me3 and H3K9me3 ChIP-seq data from olfactory epithelium revealed that LINE-1 copies located at OR gene loci show different enrichment for these heterochromatin marks. We also found that LINE-1 are transcribed in mouse olfactory epithelium. These results suggest that LINE-1 play a role in the olfactory neurons' nuclear architecture. SIGNIFICANCE STATEMENT: LINE-1 are mobile DNA elements and comprise almost 20% of mice and human genomes. These retrotransposons have been implicated in neurogenesis. We show for the first time that LINE-1 retrotransposons have a specific nuclear organization in olfactory neurons, forming aggregates concentric to the heterochromatin block and frequently occupying the same region as facultative heterochromatin. We found that LINE-1 at olfactory receptor gene loci are differently enriched for H3K9me3 and H3K27me3, but LINE-1 transcripts could be detected in the olfactory epithelium. We speculate that these retrotransposons play an active role in olfactory neurons' nuclear architecture.
- Published
- 2020
21. MIR sequences recruit zinc finger protein ZNF768 to expressed genes
- Author
-
Michael Kluge, Caroline C. Friedel, Axel Imhof, Stefan Krebs, Ann Katrin Greifenberg, Muhammad Ahmad Maqbool, Yousra Yahia, Ignasi Forné, Nicolas Descostes, Jean-Christophe Andrau, Dirk Eick, Michaela Rohrmoser, Matthias Geyer, Anita Gruber-Eber, and Helmut Blum
- Subjects
Euchromatin ,Retroelements ,Transcription, Genetic ,Cell Survival ,Interspersed repeat ,Biology ,ELP3 ,03 medical and health sciences ,0302 clinical medicine ,Cell Line, Tumor ,Gene expression ,Genetics ,Humans ,Nucleotide Motifs ,Gene ,030304 developmental biology ,Repetitive Sequences, Nucleic Acid ,Regulation of gene expression ,Zinc finger ,0303 health sciences ,Binding Sites ,Gene regulation, Chromatin and Epigenetics ,DNA ,Cell biology ,Gene Expression Regulation ,Sequence motif ,030217 neurology & neurosurgery ,Transcription Factors - Abstract
Mammalian-wide interspersed repeats (MIRs) are retrotransposed elements of mammalian genomes. Here, we report the specific binding of zinc finger protein ZNF768 to the sequence motif GCTGTGTG (N20) CCTCTCTG in the core region of MIRs. ZNF768 binding is preferentially associated with euchromatin and promoter regions of genes. Binding was observed for genes expressed in a cell type-specific manner in human B cell line Raji and osteosarcoma U2OS cells. Mass spectrometric analysis revealed binding of ZNF768 to Elongator components Elp1, Elp2 and Elp3 and other nuclear factors. The N-terminus of ZNF768 contains a heptad repeat array structurally related to the C-terminal domain (CTD) of RNA polymerase II. This array evolved in placental animals but not marsupials and monotreme species, displays species-specific length variations, and possibly fulfills CTD related functions in gene regulation. We propose that the evolution of MIRs and ZNF768 has extended the repertoire of gene regulatory mechanisms in mammals and that ZNF768 binding is associated with cell type-specific gene expression.
- Published
- 2018
22. AnEnSpminterspersed repeat identified inTriticum aestivumand implicated in resistance toDiuraphis noxia
- Author
-
Anandi Bierman and Anna-Maria Botha
- Subjects
0106 biological sciences ,0301 basic medicine ,Ecology ,biology ,Resistance (ecology) ,Interspersed repeat ,food and beverages ,Soil Science ,Aphididae ,Plant Science ,biology.organism_classification ,01 natural sciences ,Hemiptera ,Diuraphis noxia ,03 medical and health sciences ,030104 developmental biology ,Botany ,PEST analysis ,Russian wheat aphid ,010606 plant biology & botany - Abstract
Diuraphis noxia Kurdjumov, 1913 (Hemiptera: Aphididae), commonly known as the Russian wheat aphid, is a devastating pest of wheat and barley. Although fourteen sources of resistance (Dn genes) have...
- Published
- 2018
23. Genomic Organization of TBK1 Copy Number Variations in Glaucoma Patients
- Author
-
Young H. Kwon, Robert Ritch, Alan L. Robin, Edwin M. Stone, Wallace L.M. Alward, Todd E. Scheetz, Adam P. DeLuca, John H. Fingert, Kazuhide Kawase, and Jeffrey M. Liebmann
- Subjects
Male ,0301 basic medicine ,DNA Copy Number Variations ,DNA Mutational Analysis ,Interspersed repeat ,Alu element ,Protein Serine-Threonine Kinases ,Gene dosage ,Article ,03 medical and health sciences ,0302 clinical medicine ,Humans ,Medicine ,Low Tension Glaucoma ,Copy-number variation ,Intraocular Pressure ,Chromosome 12 ,Genomic organization ,Genetics ,business.industry ,Chromosome ,DNA ,Interspersed Repetitive Sequences ,Pedigree ,Ophthalmology ,030104 developmental biology ,Mutation ,030221 ophthalmology & optometry ,Female ,business - Abstract
BACKGROUND Approximately 1% of normal tension glaucoma (NTG) cases are caused by TANK-binding kinase 1 (TBK1) gene duplications and triplications. However, the precise borders and orientation of these TBK1 gene copy number variations (CNVs) on chromosome 12 are unknown. METHODS We determined the exact borders of TBK1 CNVs and the orientation of duplicated or triplicated DNA segments in 5 NTG patients with different TBK1 mutations using whole-genome sequencing. RESULTS Tandemly duplicated chromosome segments spanning the TBK1 gene were detected in 4 NTG patients, each with unique borders. Four of 5 CNVs had borders located within interspersed repetitive DNA sequences (Alu and long interspersed nuclear element-L1 elements), suggesting that mismatched homologous recombinations likely generated these CNVs. A fifth NTG patient had a complex rearrangement including triplication of a chromosome segment spanning the TBK1 gene. CONCLUSIONS No specific mutation hotspots for TBK1 CNVs were detected, however, interspersed repetitive sequences (ie, Alu elements) were identified at the borders of TBK1 CNVs, which suggest that mismatch of these elements during meiosis may be the mechanism that generated TBK1 gene dosage mutations.
- Published
- 2017
24. Widespread horizontal transfer of retrotransposons.
- Author
-
Walsh, Ali Morton, Kortschak, R. Daniel, Gardner, Michael G., Bertozzi, Terry, and Adelson, David L.
- Subjects
- *
TRANSPOSONS , *RETROTRANSPOSONS , *GENETIC transformation , *VERTEBRATE genetics , *ARTHROPOD vectors - Abstract
In higher organisms such as vertebrates, it is generally believed that lateral transfer of genetic information does not readily occur, with the exception of retroviral infection. However, horizontal transfer (HT) of protein coding repetitive elements is the simplest way to explain the patchy distribution of BovB, a long interspersed element (LINE) about 3.2 kb long, that has been found in ruminants, marsupials, squamates, monotremes, and African mammals. BovB sequences are a major component of some of these genomes. Here we show that HT of BovB is significantly more widespread than believed, and we demonstrate the existence of two plausible arthropod vectors, specifically reptile ticks. A phylogenetic tree built from BovB sequences from species in all of these groups does not conform to expected evolutionary relationships of the species, and our analysis indicates that at least nine HT events are required to explain the observed topology. Our results provide compelling evidence for HT of genetic material that has transformed vertebrate genomes. [ABSTRACT FROM AUTHOR]
- Published
- 2013
- Full Text
- View/download PDF
25. Genomic and evolutionary analyses of Tango transposons in Aedes aegypti, Anopheles gambiae and other mosquito species.
- Author
-
Coy, M. R. and Tu, Z.
- Subjects
- *
AEDES aegypti , *ANOPHELES gambiae , *MOSQUITOES , *SURVEYS , *INSECTS - Abstract
Tango is a transposon of the Tc1 family and was originally discovered in the African malaria mosquito, Anopheles gambiae. Here we report a systematic analysis of the genome sequence of the yellow fever mosquito, Aedes aegypti, which uncovered three distinct Tango transposons. We name the only An. gambiae Tango transposon AgTango1 and the three Ae. aegypti Tango elements AeTango1–3. Like AgTango1, AeTango1 and AeTango2 elements both have members that retain characteristics of autonomous elements such as intact open reading frames and terminal inverted repeats (TIRs). AeTango3 is a degenerate transposon with no full-length members. All full-length Tango transposons contain subterminal direct repeats within their TIRs. AgTango1 and AeTango1–3 form a single clade among other Tc1 transposons. Within this clade, AgTango1 and AeTango1 are closely related and share approximately 80% identity at the amino acid level, which exceeds the level of similarity of the majority of host genes in the two species. A survey of Tango in other mosquito species was carried out using degenerate PCR. Tango was isolated and sequenced in all members of the An. gambiae species complex, Aedes albopictus and Ochlerotatus atropalpus. Oc. atropalpus contains a rich diversity of Tango elements, while Tango elements in Ae. albopictus and the An. gambiae species complex all belong to Tango1. No Tango was detected in Culex pipiens quinquefasciatus, Anopheles stephensi, Anopheles dirus, Anopheles farauti or Anopheles albimanus using degenerate PCR. Bioinformatic searches of the Cx. p. quinquefasciatus (~10 × coverage) and An. stephensi (0.33 × coverage) databases also failed to uncover any Tango elements. Although other evolutionary scenarios cannot be ruled out, there are indications that Tango1 underwent horizontal transfer among divergent mosquito species. [ABSTRACT FROM AUTHOR]
- Published
- 2007
- Full Text
- View/download PDF
26. Similar Evolutionary Trajectories for Retrotransposon Accumulation in Mammals
- Author
-
Reuben M. Buckley, David L. Adelson, Joy M. Raison, and R. Daniel Kortschak
- Subjects
0301 basic medicine ,Genome evolution ,Retroelements ,Interspersed repeat ,genetic processes ,Genomics ,Retrotransposon ,Biology ,genome evolution ,Genome ,Evolution, Molecular ,03 medical and health sciences ,0302 clinical medicine ,Species Specificity ,Genetics ,Animals ,Regulatory Elements, Transcriptional ,Gene ,Ecology, Evolution, Behavior and Systematics ,Synteny ,Comparative genomics ,Mammals ,genome architecture ,fungi ,food and beverages ,transposable element ,Chromatin ,Long interspersed nuclear element ,030104 developmental biology ,Evolutionary biology ,health occupations ,030217 neurology & neurosurgery ,Research Article - Abstract
The factors guiding retrotransposon insertion site preference are not well understood. Different types of retrotransposons share common replication machinery and yet occupy distinct genomic domains. Autonomous long interspersed elements accumulate in gene-poor domains and their non-autonomous short interspersed elements accumulate in gene-rich domains. To determine genomic factors that contribute to this discrepancy we analysed the distribution of retrotransposons within the framework of chromosomal domains and regulatory elements. Using comparative genomics, we identified large-scale conserved patterns of retrotransposon accumulation across several mammalian genomes. Importantly, retrotransposons that were active after our sample-species diverged accumulated in orthologous regions. This suggested a similar evolutionary interaction between retrotransposon activity and conserved genome architecture across our species. In addition, we found that retrotransposons accumulated at regulatory element boundaries in open chromatin, where accumulation of particular retrotransposon types depended on insertion size and local regulatory element density. From our results, we propose a model where density and distribution of genes and regulatory elements canalise retrotransposon accumulation. Through conservation of synteny, gene regulation and nuclear organisation, mammalian genomes with dissimilar retrotransposons follow similar evolutionary trajectories.
- Published
- 2017
27. Helitrons and Retrotransposons Are Co-localized in Bos taurus Genomes
- Author
-
V Glazko, S Kovalchuk, A Babii, T Glazko, and G Kosovsky
- Subjects
0301 basic medicine ,Inverted repeat ,Interspersed repeat ,Retrotransposon ,Biology ,Genome ,Article ,Bos taurus ,Genome scanning ,03 medical and health sciences ,030104 developmental biology ,Evolutionary biology ,Genetics ,Helitron ,Microsatellite ,Gene pool ,Mobile genetic elements ,Helitrons ,Transposable elements ,Microsatellites ,Genetics (clinical) - Abstract
Background DNA transposons helitrons are mobile genetic elements responsible for major movements of the genetic material within and across different genomes. This ability makes helitrons suitable candidate elements for the development of new approaches of multilocus genotyping of live-stock animals, along with the well-known microsatellite loci. Objective We aimed to estimate the informativeness of helitron and microsatellite markers in assessing the consolidation and the "gene pool" standards of two commercial dairy cattle breeds (Ayrshire breed and holsteinized Black-and-White cattle) and one local breed of Kalmyk cattle, and to reveal any inter-breed difference in the organization of genomic regions flanked by helitrons in the studied cattle breeds. Method We used the combination of two highly-polymorphic genomic elements - helitrons and trinu-cleotide microsatellites (AGC)6G and (GAG)6C, respectively - for genome scanning of the sampled groups of cattle. Also, we pyrosequenced the genomic regions flanked by the inverted repeats of 3'-end of Heligloria family of helitron fragments. Results Generally, the both combinations of markers generated polymorphic spectra, based on which certain interbreed differentiation could be observed. The analysis of the identified interspersed repeats suggests that in factory and local cattle the genomic regions flanked by helitron fragments are shaped differently and contain different superfamilies of transposable elements, especially retrotransposons. Conclusion Despite the well-known fact of retrotransposon-dependent microsatellite expansion, our data suggest that, in the cattle genome, the DNA transposons and microsatellites can also be found in close neighbourhood, and that helitrons and retrotransposons may form domains of increased variability - targets for factors of artificial selection.
- Published
- 2017
28. Gambol and Tc1 are two distinct families of DD34E transposons: analysis of the Anopheles gambiae genome expands the diversity of the IS630-Tc1-mariner superfamily.
- Author
-
Coy, M. R. and Zhijian Tu
- Subjects
- *
TRANSPOSONS , *MOBILE genetic elements , *ANIMALS , *ANOPHELES gambiae , *MALARIA - Abstract
Tc1 is a family of DNA transposons found in diverse organisms including vertebrates, invertebrates and fungi. Tc1 belongs to the IS630-Tc1-mariner superfamily, which is characterized by common ‘TA’ target site and conserved D(Asp)DE(Glu) or DDD catalytic triad. All functional Tc1-like transposons contain a transposase with a DD34E catalytic triad. We conducted a systematic analysis of DD34E transposons in the African malaria mosquito, Anopheles gambiae, using a reiterative and exhaustive search program. In addition to previously described Tc1-like elements, we uncovered 26 new DD34E transposons including a novel family that we named gambol. Designation of family status to gambol is based on phylogenetic analyses of transposase sequences that showed gambol and Tc1 transposons as distinct clades that were separated by mariner and other families of the IS630-Tc1-mariner superfamily. The distinction between Tc1 and gambol is also consistent with the unique TIRs in gambol elements and the presence of a ‘W[I/L/V]DEDC’ signature near their N-termini. This signature is predicted as part of the ‘RED’ domain, a component of the ‘PAI’ and ‘RED’ DNA binding domains in Tc1 and possibly mariner. Although gambol appears to be related to a few DD34E transposons from cyanobacteria and fungi, no gambol has been reported in any other insects or animals thus far. Several gambol and Tc1 elements have intact ORFs and different genomic copies with high sequence identity, which suggests that they may have been recently active. [ABSTRACT FROM AUTHOR]
- Published
- 2005
- Full Text
- View/download PDF
29. Whole-genome expression analysis of mammalian-wide interspersed repeat elements in human cell lines
- Author
-
Anastasia Conti, Davide Carnevali, Giorgio Dieci, and Matteo Pellegrini
- Subjects
0301 basic medicine ,RNA polymerase III ,Retroelements ,Transcription, Genetic ,1.1 Normal biological development and functioning ,Interspersed repeat ,Plant Biology & Botany ,Computational biology ,Biology ,ENCODE ,Genome ,03 medical and health sciences ,chemistry.chemical_compound ,Genetic ,Transcription (biology) ,RNA polymerase ,Genetics ,2.1 Biological and endogenous factors ,Humans ,RNA-Seq ,Molecular Biology ,Gene ,Sequence Analysis, RNA ,Gene Expression Profiling ,Human Genome ,Computational Biology ,General Medicine ,mammalian-wide interspersed repeats ,Full Papers ,SINE ,Interspersed Repetitive Sequences ,030104 developmental biology ,chemistry ,Hela Cells ,RNA ,Human genome ,Generic health relevance ,Sequence Analysis ,Transcription ,Overlapping gene ,Biotechnology ,HeLa Cells ,Plasmids - Abstract
With more than 500,000 copies, mammalian-wide interspersed repeats (MIRs), a sub-group of SINEs, represent ∼2.5% of the human genome and one of the most numerous family of potential targets for the RNA polymerase (Pol) III transcription machinery. Since MIR elements ceased to amplify ∼130 myr ago, previous studies primarily focused on their genomic impact, while the issue of their expression has not been extensively addressed. We applied a dedicated bioinformatic pipeline to ENCODE RNA-Seq datasets of seven human cell lines and, for the first time, we were able to define the Pol III-driven MIR transcriptome at single-locus resolution. While the majority of Pol III-transcribed MIR elements are cell-specific, we discovered a small set of ubiquitously transcribed MIRs mapping within Pol II-transcribed genes in antisense orientation that could influence the expression of the overlapping gene. We also identified novel Pol III-transcribed ncRNAs, deriving from transcription of annotated MIR fragments flanked by unique MIR-unrelated sequences, and confirmed the role of Pol III-specific internal promoter elements in MIR transcription. Besides demonstrating widespread transcription at these retrotranspositionally inactive elements in human cells, the ability to profile MIR expression at single-locus resolution will facilitate their study in different cell types and states including pathological alterations.
- Published
- 2016
30. The genome of the zoonotic malaria parasite Plasmodium simium reveals adaptations to host-switching
- Author
-
Stefan T. Arold, Anielle de Pina-Costa, Patrícia Brasil, Francisco J. Guzmán-Vega, Cláudio Tadeu Daniel-Ribeiro, Filipe Vieira Santos de Abreu, Richard Culleton, Zelinda Maria Braga Hirano, Qingtian Guan, Denise Anete Madureira de Alvarenga, Cristiana Ferreira Alves de Brito, Cesare Bianco Júnior, Olga Douvropoulou, Sarah Forrester, Daniel C. Jeffares, Abhinav Kaushik, Júlio César de Souza Junior, Silvia Bahadian Moreira, Alcides Pissinatti, Ricardo Lourenço de Oliveira, Maria de Fátima Ferreira-da-Cruz, Tobias Mourier, and Arnab Pain
- Subjects
Genetics ,0303 health sciences ,education.field_of_study ,Zoonotic Infection ,biology ,030231 tropical medicine ,Interspersed repeat ,Population ,Plasmodium vivax ,biology.organism_classification ,medicine.disease ,Plasmodium ,Genome ,3. Good health ,03 medical and health sciences ,0302 clinical medicine ,parasitic diseases ,medicine ,education ,Gene ,Malaria ,030304 developmental biology - Abstract
Plasmodium simium, a malaria parasite of non-human primates in the Atlantic forest region of Brazil was recently shown to cause zoonotic infection in humans in the region. Phylogenetic analyses based on the whole genome sequences of six P. simium isolates infecting humans and two isolates from brown howler monkeys revealed that P. simium is monophyletic within the broader diversity of South American Plasmodium vivax, consistent with the hypothesis that P. simium first infected non-human primates as a result of a host-switch from humans carrying P. vivax. We provide molecular evidence that the current zoonotic infections of people have likely resulted from multiple independent host switches, each seeded from a different monkey infection. Very low levels of genetic diversity within P. simium genomes and the absence of P. simium-P. vivax hybrids suggest that the P. simium population emerged recently and has subsequently experienced a period of independent evolution in Platyrrhini monkeys. We further find that Plasmodium Interspersed Repeat (PIR) genes, Plasmodium Helical Interspersed Subtelomeric (PHIST) genes and Tryptophan-Rich Antigens (TRAg) genes in P. siumium are genetically divergent from P. vivax and are enriched for non-synonymous single nucleotide polymorphisms, consistent with the rapid evolution of these genes. Analysis of genes involved in erythrocyte invasion revealed several notable differences between P. vivax and P. simium, including large deletions within the coding region of the Duffy Binding Protein 1 (DBP1) and Reticulocyte Binding Protein 2a (RBP2a) genes in P. simium. Genotyping of P. simium isolates from non-human primates (NHPs) and zoonotic human infections showed that a precise deletion of 38 amino acids in DBP1 is exclusively present in all human infecting isolates, whereas non-human primate infecting isolates were polymorphic for the deletion. We speculate that these deletions in the parasite-encoded key erythrocyte invasion ligands and the additional rapid genetic changes have facilitated zoonotic transfer to humans. Non-human primate malaria parasites can be considered a reservoir of potential infectious human parasites that must be considered in any attempt of malaria elimination. The genome of P. simium will thus form an important basis for future functional characterizations on the mechanisms underlying malaria zoonosis.
- Published
- 2019
- Full Text
- View/download PDF
31. CGGBP1 regulates CTCF occupancy at repeats
- Author
-
Subhamoy Datta, Manthan Patel, Divyesh Patel, and Umashankar Singh
- Subjects
CCCTC-Binding Factor ,lcsh:QH426-470 ,Interspersed repeat ,Cell Line ,Histones ,03 medical and health sciences ,0302 clinical medicine ,Genetics ,Humans ,RNA, Small Interfering ,Molecular Biology ,030304 developmental biology ,Cell Nucleus ,Regulation of gene expression ,0303 health sciences ,Binding Sites ,biology ,Research ,Chromatin ,Cell biology ,ChIP-sequencing ,DNA-Binding Proteins ,lcsh:Genetics ,Histone ,CTCF ,030220 oncology & carcinogenesis ,DNA methylation ,biology.protein ,H3K4me3 ,RNA Interference ,Protein Processing, Post-Translational - Abstract
Background CGGBP1 is a repeat-binding protein with diverse functions in the regulation of gene expression, cytosine methylation, repeat silencing and genomic integrity. CGGBP1 has also been identified as a cooperator of histone-modifying enzymes and as a component of CTCF-containing complexes that regulate the enhancer–promoter looping. CGGBP1–CTCF cross talk in chromatin regulation has been hitherto unknown. Results Here, we report that the occupancy of CTCF at repeats depends on CGGBP1. Using ChIP-sequencing for CTCF, we describe its occupancy at repetitive DNA. Our results show that endogenous level of CGGBP1 ensures CTCF occupancy preferentially on repeats over canonical CTCF motifs. By combining CTCF ChIP-sequencing results with ChIP sequencing for three different kinds of histone modifications (H3K4me3, H3K9me3 and H3K27me3), we show that the CGGBP1-dependent repeat-rich CTCF-binding sites regulate histone marks in flanking regions. Conclusion CGGBP1 affects the pattern of CTCF occupancy. Our results posit CGGBP1 as a regulator of CTCF and its binding sites in interspersed repeats.
- Published
- 2019
32. The mitochondrial genome of Morchella importuna (272.2 kb) is the largest among fungi and contains numerous introns, mitochondrial non-conserved open reading frames and repetitive sequences
- Author
-
Yingli Cai, Ma Xiaolong, Lianfu Chen, Qianqian Zhang, Fang Shu, Yinbing Bian, and Wei Liu
- Subjects
Mitochondrial DNA ,Interspersed repeat ,02 engineering and technology ,Biochemistry ,Homing endonuclease ,03 medical and health sciences ,Open Reading Frames ,Ascomycota ,Structural Biology ,Group I catalytic intron ,Molecular Biology ,Gene ,Phylogeny ,030304 developmental biology ,Repetitive Sequences, Nucleic Acid ,Genetics ,0303 health sciences ,biology ,Intron ,Molecular Sequence Annotation ,General Medicine ,Group II intron ,Ribosomal RNA ,021001 nanoscience & nanotechnology ,Introns ,Mitochondria ,Genome, Mitochondrial ,biology.protein ,0210 nano-technology - Abstract
The complete mitochondrial genome of Morchella importuna, the famous edible and medicinal mushroom, was assembled as a 272,238 bp single circular dsDNA. As the largest mitogenome among fungi, it exhibits several distinct characteristics. The mitogenome of M. importuna encoded 14 core conserved mitochondrial protein-coding genes and 151 mitochondrial non-conserved open reading frames (ncORFs) were predicted, of which 61 were annotated as homing endonuclease genes, and 108 were confirmed to be expressed during the vegetative growth stages of M. importuna. In addition, 34 introns were identified in seven core genes (cob, cox1, cox2, cox3, nad1, nad4 and nad5) and two rRNA genes (rrnS and rrnL) with a length from 383 bp to 7453 bp, and eight large introns with a length range of 2340 bp to 7453 bp contained multiple intronic mtORFs. Moreover, 34 group I (IA, IB, IC1, IC2, ID and derived group I introns) and four group II intron domains were identified for the 34 introns, including five hybrid ones. Furthermore, the M. importuna mitogenome showed the presence of about 18.7% mitogenomic interspersed repeats. These and the aforementioned ncORFs and introns, contributed to the enlarged size of the mitogenome.
- Published
- 2019
33. ImtRDB: a database and software for mitochondrial imperfect interspersed repeats annotation
- Author
-
Valeria N. Timonina, Konstantin Gunbin, Konstantin Popadin, and Viktor N. Shamanskiy
- Subjects
0106 biological sciences ,Mitochondrial DNA ,lcsh:QH426-470 ,lcsh:Biotechnology ,Interspersed repeat ,Biology ,computer.software_genre ,01 natural sciences ,Genome ,Database ,03 medical and health sciences ,chemistry.chemical_compound ,lcsh:TP248.13-248.65 ,Genetics ,Degeneracy (biology) ,030304 developmental biology ,0303 health sciences ,mtDNA ,Algorithms ,DNA, Circular/genetics ,DNA, Mitochondrial/genetics ,Databases, Genetic ,Repetitive Sequences, Nucleic Acid ,Software ,Imperfect repeats ,Selection on dinucleotides ,lcsh:Genetics ,Chloroplast DNA ,chemistry ,DNA microarray ,computer ,DNA ,GC-content ,010606 plant biology & botany ,Biotechnology - Abstract
Mitochondria is a powerhouse of all eukaryotic cells that have its own circular DNA (mtDNA) encoding various RNAs and proteins. Somatic perturbations of mtDNA are accumulating with age thus it is of great importance to uncover the main sources of mtDNA instability. Recent analyses demonstrated that somatic mtDNA deletions depend on imperfect repeats of various nature between distant mtDNA segments. However, till now there are no comprehensive databases annotating all types of imperfect repeats in numerous species with sequenced complete mitochondrial genome as well as there are no algorithms capable to call all types of imperfect repeats in circular mtDNA. We implemented naive algorithm of pattern recognition by analogy to standard dot-plot construction procedures allowing us to find both perfect and imperfect repeats of four main types: direct, inverted, mirror and complementary. Our algorithm is adapted to specific characteristics of mtDNA such as circularity and an excess of short repeats - it calls imperfect repeats starting from the length of 10 b.p. We constructed interactive web available database ImtRDB depositing perfect and imperfect repeats positions in mtDNAs of more than 3500 Vertebrate species. Additional tools, such as visualization of repeats within a genome, comparison of repeat densities among different genomes and a possibility to download all results make this database useful for many biologists. Our first analyses of the database demonstrated that mtDNA imperfect repeats (i) are usually short; (ii) associated with unfolded DNA structures; (iii) four types of repeats positively correlate with each other forming two equivalent pairs: direct and mirror versus inverted and complementary, with identical nucleotide content and similar distribution between species; (iv) abundance of repeats is negatively associated with GC content; (v) dinucleotides GC versus CG are overrepresented on light chain of mtDNA covered by repeats. ImtRDB is available at http://bioinfodbs.kantiana.ru/ImtRDB/ . It is accompanied by the software calling all types of interspersed repeats with different level of degeneracy in circular DNA. This database and software can become a very useful tool in various areas of mitochondrial and chloroplast DNA research.
- Published
- 2019
34. Large-scale potential RNA editing profiling in different adult chicken tissues
- Author
-
H. Shafiei, M. R. Bakhtiarizadeh, and A. Salehi
- Subjects
0301 basic medicine ,biology ,Interspersed repeat ,0402 animal and dairy science ,RNA ,RNA-Seq ,04 agricultural and veterinary sciences ,General Medicine ,Computational biology ,040201 dairy & animal science ,Transcriptome ,03 medical and health sciences ,030104 developmental biology ,RNA editing ,Organ Specificity ,Gene expression ,Genetics ,biology.protein ,GABRA3 ,Animals ,Animal Science and Zoology ,RNA Editing ,Gene ,Chickens - Abstract
RNA editing is a post-transcription maturation process that diversifies genomically encoded information and can lead to transcriptome diversity. Thanks to next-generation sequencing technologies, a large number of editing sites have been identified in different species. Although this mechanism is well described in mammals, only a few studies have been performed in chicken. Here, candidate or potential RNA editing sites were identified in eight different tissues of chicken (brain, spleen, colon, lung, kidney, heart, testes and liver). We identified 68 A-to-G editing sites in 46 genes. Only two of these were previously reported in chicken. We found no C-to-T sites, attesting to the lack of this type of editing mechanism in chicken. Similar to mammals, the editing sites were enriched in non-coding regions, rarely resulted in a change in amino acids, showed a critical role in the nervous system and had a low guanosine level upstream of the editing site and some enrichment downstream from the site. Moreover, in contrast to mammals, editing sites were weakly enriched in interspersed repeats and the number and editing ratio of non-synonymous sites were higher than for those of synonymous sites. Interestingly, we found several tissue-specific edited genes, including GABRA3, SORL1 and HTR1D in brain and RYR2 and FHOD3 in heart, that were associated with functional processes relevant to the corresponding tissue. This finding highlights the importance of RNA editing in several chicken tissues, especially the brain, and establishes a foundation for further exploration of this process.
- Published
- 2019
35. Evolutionary Analysis of the F-Box Gene Family in Saccharomycetaceae
- Author
-
Shiheng Tao, Yanlin Liu, Siddiq Ur Rahman, Mingyue Yao, Rashid Mehmood, Sayed Haidar Abbas Raza, Ailan Wang, and Tao Ma
- Subjects
0301 basic medicine ,Genome evolution ,Interspersed repeat ,Evolution, Molecular ,03 medical and health sciences ,0302 clinical medicine ,Molecular evolution ,Gene duplication ,Genetics ,Gene family ,Selection, Genetic ,Molecular Biology ,Gene ,biology ,F-Box Proteins ,Cell Biology ,General Medicine ,Genomics ,biology.organism_classification ,Yeast ,030104 developmental biology ,Saccharomycetaceae ,030220 oncology & carcinogenesis ,Saccharomycetales ,Genome, Fungal - Abstract
F-box proteins are a core component of Skp1-Cul1-F-box (SCF) ubiquitin/ligase complexes and are involved in a lot of cellular processes in yeasts. However, the current knowledge of the molecular evolution of the F-box gene family in yeasts remains unclear. In this study, 136 F-box genes were identified in 10 yeast species of the Saccharomycetaceae. In addition to the F-box domain, the other six domains were identified in these F-box proteins. The evolutionary history of F-box gene numbers in 10 Saccharomycetaceae yeasts was reconstructed. Whole-genome duplication, interspersed repeats, and gene loss events were inferred. These events contributed to F-box gene number variation in the 10 yeast species. Eighty-seven and 33 positively selected sites were detected in program Selecton and Datamonkey web-server, respectively. Three of them were considered the significant positively selected sites, and 23 of them had changed radically in amino acid properties by using TreeSAAP. We investigated F-box gene number variation and underlying mechanisms, and selection patterns, all of which were beneficial to deeply understand genome evolution and figure out the function of the F-box proteins.
- Published
- 2019
36. SQuIRE reveals locus-specific regulation of interspersed repeat expression
- Author
-
Clarissa N Pacyna, Lindsay M. Payer, Daniel Ardeljan, Kathleen H. Burns, and Wan R Yang
- Subjects
Transposable element ,Transcription, Genetic ,RNA Splicing ,Interspersed repeat ,Locus (genetics) ,Computational biology ,Biology ,Genome ,03 medical and health sciences ,Mice ,0302 clinical medicine ,Transcription (biology) ,Gene expression ,Genetics ,Animals ,030304 developmental biology ,0303 health sciences ,Sequence Analysis, RNA ,Amyotrophic Lateral Sclerosis ,Disease Models, Animal ,Drosophila melanogaster ,Genetic Loci ,RNA splicing ,DNA Transposable Elements ,Methods Online ,Human genome ,030217 neurology & neurosurgery ,Software - Abstract
Transposable elements (TEs) are interspersed repeat sequences that make up much of the human genome. Their expression has been implicated in development and disease. However, TE-derived RNA-seq reads are difficult to quantify. Past approaches have excluded these reads or aggregated RNA expression to subfamilies shared by similar TE copies, sacrificing quantitative accuracy or the genomic context necessary to understand the basis of TE transcription. As a result, the effects of TEs on gene expression and associated phenotypes are not well understood. Here, we present Software for Quantifying Interspersed Repeat Expression (SQuIRE), the first RNA-seq analysis pipeline that provides a quantitative and locus-specific picture of TE expression (https://github.com/wyang17/SQuIRE). SQuIRE is an accurate and user-friendly tool that can be used for a variety of species. We applied SQuIRE to RNA-seq from normal mouse tissues and a Drosophila model of amyotrophic lateral sclerosis. In both model organisms, we recapitulated previously reported TE subfamily expression levels and revealed locus-specific TE expression. We also identified differences in TE transcription patterns relating to transcript type, gene expression and RNA splicing that would be lost with other approaches using subfamily-level analyses. Altogether, our findings illustrate the importance of studying TE transcription with locus-level resolution.
- Published
- 2019
37. Comparative analysis of repetitive sequences among species from the potato and the tomato clades
- Author
-
Hans de Jong, Paola Gaiero, M. Eric Schranz, Magdalena Vaio, Pablo Speranza, and Sander Peters
- Subjects
0106 biological sciences ,DNA, Plant ,Interspersed repeat ,Introgression ,Context (language use) ,Plant Science ,Laboratorium voor Erfelijkheidsleer ,Solanum ,010603 evolutionary biology ,01 natural sciences ,Genome ,Evolution, Molecular ,BIOS Applied Bioinformatics ,Solanum lycopersicum ,Groep Koornneef ,Clade ,Genome size ,Phylogeny ,Solanaceae ,Repetitive Sequences, Nucleic Acid ,Solanum tuberosum ,Relative abundance ,biology ,Phylogenetic tree ,fungi ,Solanum etuberosum ,Repeat profiles ,food and beverages ,Sequence Analysis, DNA ,Original Articles ,biology.organism_classification ,Biosystematiek ,Crop wild relatives ,Evolutionary biology ,Biosystematics ,Laboratory of Genetics ,EPS ,Transposable elements ,Genome, Plant ,010606 plant biology & botany - Abstract
BACKGROUND AND AIMS: The genus Solanum includes important vegetable crops and their wild relatives. Introgression of their useful traits into elite cultivars requires effective recombination between hom(e)ologues, which is partially determined by genome sequence differentiation. In this study we compared the repetitive genome fractions of wild and cultivated species of the potato and tomato clades in a phylogenetic context. METHODS: Genome skimming followed by a clustering approach was used as implemented in the RepeatExplorer pipeline. Repeat classes were annotated and the sequences of their main domains were compared. KEY RESULTS: Repeat abundance and genome size were correlated and the larger genomes of species in the tomato clade were found to contain a higher proportion of unclassified elements. Families and lineages of repetitive elements were largely conserved between the clades, but their relative proportions differed. The most abundant repeats were Ty3/Gypsy elements. Striking differences in abundance were found in the highly dynamic Ty3/Gypsy Chromoviruses and Ty1/Copia Tork elements. Within the potato clade, early branching Solanum cardiophyllum showed a divergent repeat profile. There were also contrasts between cultivated and wild potatoes, mostly due to satellite amplification in the cultivated species. Interspersed repeat profiles were very similar among potatoes. The repeat profile of Solanum etuberosum was more similar to that of the potato clade. CONCLUSIONS: The repeat profiles in Solanum seem to be very similar despite genome differentiation at the level of collinearity. Removal of transposable elements by unequal recombination may have been responsible for structural rearrangements across the tomato clade. Sequence variability in the tomato clade is congruent with clade-specific amplification of repeats after its divergence from S. etuberosum and potatoes. The low differentiation among potato and its wild relatives at the level of interspersed repeats may explain the difficulty in discriminating their genomes by genomic in situ hybridization techniques.
- Published
- 2019
38. Transcriptome Analysis of Recurrently Deregulated Genes across Multiple Cancers Identifies New Pan-Cancer Biomarkers
- Author
-
Hideya Kawaji, Masayoshi Itoh, Yuji Tanaka, Piero Carninci, Albin Sandelin, Alistair R. R. Forrest, Yoshihide Hayashizaki, Timo Lassmann, Bogumil Kaczkowski, and Robin Andersson
- Subjects
0301 basic medicine ,Regulation of gene expression ,Genetics ,Cancer Research ,Gene Expression Profiling ,Interspersed repeat ,Promoter ,Computational biology ,Biology ,Cap analysis gene expression ,Gene Expression Regulation, Neoplastic ,Gene expression profiling ,Transcriptome ,03 medical and health sciences ,030104 developmental biology ,Oncology ,Cell Line, Tumor ,Neoplasms ,Biomarkers, Tumor ,Humans ,Enhancer ,Gene - Abstract
Genes that are commonly deregulated in cancer are clinically attractive as candidate pan-diagnostic markers and therapeutic targets. To globally identify such targets, we compared Cap Analysis of Gene Expression profiles from 225 different cancer cell lines and 339 corresponding primary cell samples to identify transcripts that are deregulated recurrently in a broad range of cancer types. Comparing RNA-seq data from 4,055 tumors and 563 normal tissues profiled in the The Cancer Genome Atlas and FANTOM5 datasets, we identified a core transcript set with theranostic potential. Our analyses also revealed enhancer RNAs, which are upregulated in cancer, defining promoters that overlap with repetitive elements (especially SINE/Alu and LTR/ERV1 elements) that are often upregulated in cancer. Lastly, we documented for the first time upregulation of multiple copies of the REP522 interspersed repeat in cancer. Overall, our genome-wide expression profiling approach identified a comprehensive set of candidate biomarkers with pan-cancer potential, and extended the perspective and pathogenic significance of repetitive elements that are frequently activated during cancer progression. Cancer Res; 76(2); 216–26. ©2015 AACR.
- Published
- 2016
39. Characterization of contiguous gene deletions in COL4A6 and COL4A5 in Alport syndrome-diffuse leiomyomatosis
- Author
-
Shogo Minamikawa, China Nagano, Tomohiko Yamamura, Kandai Nozu, Motoko Yanagita, Koichi Nakanishi, Hiroshi Kaito, Takeshi Ninchoji, Eihiko Takahashi, Kazumoto Iijima, Yoshimitsu Gotoh, Naoya Morisada, Shuichiro Fujinaga, Ichiro Morioka, Takahiro Morishita, Masafumi Oka, Shiro Yamada, and Igor Vorechovsky
- Subjects
Collagen Type IV ,0301 basic medicine ,medicine.medical_specialty ,Interspersed repeat ,030232 urology & nephrology ,Nephritis, Hereditary ,Retrotransposon ,Biology ,03 medical and health sciences ,0302 clinical medicine ,Leiomyomatosis ,otorhinolaryngologic diseases ,Genetics ,medicine ,Humans ,Alport syndrome ,Genetics (clinical) ,Base Sequence ,Breakpoint ,Cytogenetics ,medicine.disease ,Molecular biology ,030104 developmental biology ,Human genome ,Homologous recombination ,Gene Deletion - Abstract
Alport syndrome-diffuse leiomyomatosis (AS-DL, OMIM: 308940) is a rare variant of the X-linked Alport syndrome that shows overgrowth of visceral smooth muscles in the gastrointestinal, respiratory and female reproductive tracts in addition to renal symptoms. AS-DL results from deletions that encompass the 5′ ends of the COL4A5 and COL4A6 genes, but deletion breakpoints between COL4A5 and COL4A6 have been determined in only four cases. Here, we characterize deletion breakpoints in five AS-DL patients and show a contiguous COL4A6/COL4A5 deletion in each case. We also demonstrate that eight out of nine deletion alleles involved sequences homologous between COL4A5 and COL4A6. Most breakpoints took place in recognizable transposed elements, including long and short interspersed repeats, DNA transposons and long-terminal repeat retrotransposons. Because deletions involved the bidirectional promoter region in each case, we suggest that the occurrence of leiomyomatosis in AS-DL requires inactivation of both genes. Altogether, our study highlights the importance of homologous recombination involving multiple transposed elements for the development of this continuous gene syndrome and other atypical loss-of-function phenotypes.
- Published
- 2017
40. Biosynthesis of Circular RNA ciRS-7/CDR1as Is Mediated by Mammalian-wide Interspersed Repeats
- Author
-
Karim Rahimi, Akila Mayeda, Thomas B. Hansen, Rei Yoshimoto, and Jørgen Kjems
- Subjects
0301 basic medicine ,Multidisciplinary ,Inverted repeat ,Interspersed repeat ,RNA ,Alu element ,02 engineering and technology ,Computational biology ,Biological Sciences ,Biology ,021001 nanoscience & nanotechnology ,Article ,03 medical and health sciences ,030104 developmental biology ,Circular RNA ,RNA splicing ,CRISPR ,lcsh:Q ,lcsh:Science ,0210 nano-technology ,Molecular Biology ,Gene - Abstract
Summary Circular RNAs (circRNAs) are stable non-coding RNAs with a closed circular structure. One of the best studied circRNAs is ciRS-7 (CDR1as), which acts as a regulator of the microRNA miR-7; however, its biosynthetic pathway has remained an enigma. Here we delineate the biosynthetic pathway of ciRS-7. The back-splicing events that form circRNAs are often facilitated by flanking inverted repeats of the primate-specific Alu elements. The ciRS-7 gene lacks these elements, but, instead, we identified a set of flanking inverted elements belonging to the mammalian-wide interspersed repeat (MIR) family. Splicing reporter assays in HEK293 cells demonstrated that these inverted MIRs are required to generate ciRS-7 through back-splicing, and CRISPR/Cas9-mediated deletions confirmed the requirement of the endogenous MIR elements in SH-SY5Y cells. Using bioinformatic searches, we identified several other MIR-dependent circRNAs and confirmed them experimentally. We propose that MIR-mediated RNA circularization is used to generate a subset of mammalian circRNAs., Graphical Abstract, Highlights • The circular RNA, ciRS-7 (CDR1as), functions as a regulator of miR-7 • ciRS-7 is generated by back-splicing, not via intra-lariat splicing • Back-splicing of ciRS-7 is promoted by the flanking inverted MIR elements • The biosynthesis of a subset of mammalian circRNAs could be mediated by MIRs, Biological Sciences; Molecular Biology
- Published
- 2020
41. Traumatic globe dislocation into the ethmoidal sinus
- Author
-
Devjyoti Tripathy
- Subjects
0301 basic medicine ,Genetics ,Images In… ,biology ,Oligonucleotide ,business.industry ,Interspersed repeat ,Monozygotic twin ,General Medicine ,030105 genetics & heredity ,Genome ,DNA sequencing ,Restriction fragment ,03 medical and health sciences ,genomic DNA ,0302 clinical medicine ,biology.protein ,Medicine ,Restriction fragment length polymorphism ,business ,030217 neurology & neurosurgery - Abstract
A genomic differential display method was developed that analyzes many restriction fragment length polymorphisms simultaneously. Interspersed repeat sequences were used to reduce DNA sample complexity and to target genomic subsets of interest. This work focused on trinucleotide repeats because of their importance in human inherited diseases. Immobilized repeat-containing oligonucleotides were used to capture genomic DNA fragments containing sequences complementary to the oligonucleotide. Captured fragments were amplified by PCR and fluorescently labeled using primers complementary to the repeat sequence and/or to the known sequences ligated to the ends of the restriction fragments. The labeled PCR fragments were displayed by size on a high-resolution automated fluorescent DNA sequencing instrument. Although there was a conservation in the overall pattern of displayed genome subsets, many clear and reproducible differences were detected when genomes from different individuals were compared. Fewer differences were detected within, than between, monozygotic twin pair genomes. In control experiments, the method distinguished between Huntington disease alleles with normal and expanded CAG repeat lengths.
- Published
- 2020
42. Biosynthesis of Circular RNA ciRS-7/CDR1as Is Mediated by Mammalian-Wide Interspersed Repeats (MIRs)
- Author
-
Akila Mayeda, Karim Rahimi, Thomas Riisgaard Hansen, Rei Yoshimoto, and Jørgen Kjems
- Subjects
Inverted repeat ,Circular RNA ,RNA splicing ,Interspersed repeat ,Alu element ,RNA ,CRISPR ,Computational biology ,Biology ,Gene - Abstract
SUMMARYCircular RNAs (circRNAs) are stable noncoding RNAs with a closed circular structure. One of the first and best studied circRNAs is ciRS-7 (CDR1as) that acts as a regulator of the microRNA miR-7, however, the biosynthesis pathway has remained an enigma. Here we delineate the biosynthesis pathway of ciRS-7. The back-splicing events that form circRNAs are often facilitated by flanking inverted repeats of the primate-specific Alu elements. ciRS-7 gene lacks these elements but, instead, we identified a set of flanking inverted elements belonging to the mammalian-wide interspersed repeat (MIR) family. Splicing reporter assays in HEK293 cells demonstrated that these inverted MIRs are required to generate ciRS-7 through a back-splicing and CRISPR/Cas9-mediated deletions confirmed the requirement of the endogenous MIR elements in SH-SY5Y cells. Using bioinformatics searches, we identified several other MIR-dependent circRNAs that we confirmed experimentally. We propose that MIR-mediated RNA circularization constitutes a new widespread biosynthesis principle for mammalian circRNAs.
- Published
- 2018
- Full Text
- View/download PDF
43. Assessing genome assembly quality using the LTR Assembly Index (LAI)
- Author
-
Ning Jiang, Jinfeng Chen, and Shujun Ou
- Subjects
2. Zero hunger ,0301 basic medicine ,Chromosomes, Artificial, Bacterial ,Retroelements ,viruses ,Interspersed repeat ,Computational Biology ,Sequence assembly ,Oryza ,Retrotransposon ,Genomics ,Computational biology ,Biology ,Solanum ,Genome ,03 medical and health sciences ,030104 developmental biology ,Metric (mathematics) ,Genetics ,Methods Online ,Genome size ,Gene ,Genome, Plant ,Software ,Selection (genetic algorithm) - Abstract
Assembling a plant genome is challenging due to the abundance of repetitive sequences, yet no standard is available to evaluate the assembly of repeat space. LTR retrotransposons (LTR-RTs) are the predominant interspersed repeat that is poorly assembled in draft genomes. Here, we propose a reference-free genome metric called LTR Assembly Index (LAI) that evaluates assembly continuity using LTR-RTs. After correcting for LTR-RT amplification dynamics, we show that LAI is independent of genome size, genomic LTR-RT content, and gene space evaluation metrics (i.e., BUSCO and CEGMA). By comparing genomic sequences produced by various sequencing techniques, we reveal the significant gain of assembly continuity by using long-read-based techniques over short-read-based methods. Moreover, LAI can facilitate iterative assembly improvement with assembler selection and identify low-quality genomic regions. To apply LAI, intact LTR-RTs and total LTR-RTs should contribute at least 0.1% and 5% to the genome size, respectively. The LAI program is freely available on GitHub: https://github.com/oushujun/LTR_retriever.
- Published
- 2018
44. Lineage tracing using a Cas9-deaminase barcoding system targeting endogenous L1 elements
- Author
-
Goo Jang, Yujin Jeon, Wookjae Lee, Namjin Cho, Duhee Bang, Byungjin Hwang, and Soo Young Yum
- Subjects
0301 basic medicine ,Genome evolution ,Science ,Interspersed repeat ,General Physics and Astronomy ,Mutagenesis (molecular biology technique) ,Genomics ,02 engineering and technology ,Computational biology ,Biology ,Cell fate determination ,medicine.disease_cause ,Time-Lapse Imaging ,General Biochemistry, Genetics and Molecular Biology ,Article ,03 medical and health sciences ,Genome editing ,CRISPR-Associated Protein 9 ,Cytidine Deaminase ,medicine ,DNA Barcoding, Taxonomic ,Humans ,Cell Lineage ,lcsh:Science ,Gene Editing ,Mutation ,Multidisciplinary ,Cas9 ,Cell Differentiation ,General Chemistry ,Cytidine deaminase ,021001 nanoscience & nanotechnology ,030104 developmental biology ,HEK293 Cells ,Long Interspersed Nucleotide Elements ,Mutagenesis ,lcsh:Q ,Single-Cell Analysis ,0210 nano-technology ,HeLa Cells ,RNA, Guide, Kinetoplastida - Abstract
Determining cell lineage and function is critical to understanding human physiology and pathology. Although advances in lineage tracing methods provide new insight into cell fate, defining cellular diversity at the mammalian level remains a challenge. Here, we develop a genome editing strategy using a cytidine deaminase fused with nickase Cas9 (nCas9) to specifically target endogenous interspersed repeat regions in mammalian cells. The resulting mutation patterns serve as a genetic barcode, which is induced by targeted mutagenesis with single-guide RNA (sgRNA), leveraging substitution events, and subsequent read out by a single primer pair. By analyzing interspersed mutation signatures, we show the accurate reconstruction of cell lineage using both bulk cell and single-cell data. We envision that our genetic barcode system will enable fine-resolution mapping of organismal development in healthy and diseased mammalian states., Lineage tracing has provided new insights into cell fate but defining cellular diversity remains a challenge. Here the authors target endogenous repeat regions in mammalian cells with cytidine deaminase fused to nCas9 to create genetic barcodes for fine-resolution mapping.
- Published
- 2018
45. Large-scale RNA editing profiling in different adult chicken tissues
- Author
-
M. R. Bakhtiarizadeh, A. Salehi, and H. Shafiei
- Subjects
chemistry.chemical_classification ,biology ,Interspersed repeat ,RNA ,Guanosine ,Computational biology ,Amino acid ,Transcriptome ,chemistry.chemical_compound ,chemistry ,RNA editing ,biology.protein ,GABRA3 ,Gene - Abstract
RNA editing is a post-transcription maturation process that diversifies genomically encoded information and can lead to diversity and complexity of transcriptome, especially in the brain. Thanks to next-generation sequencing technologies, a large number of editing sites have been identified in different species, especially in human, mouse and rat. While this mechanism is well described in mammals, only a few studies have been performed in the chicken. Here, we developed a rigorous computational strategy to identify RNA editing sites in eight different tissues of the chicken (brain, spleen, colon, lung, kidney, heart, testes and liver), based on RNA sequencing data alone. We identified 68 A-to-G editing sites in 46 genes. Only two of these were previously reported in chicken. We found no C-to-U sites, attesting the lack of this type of editing mechanism in the chicken. Similar to mammals, the editing sites were enriched in non-coding regions, rarely resulted in change of amino acids, showed a critical role in nervous system and had a low guanosine level upstream of the editing site and some enrichment downstream from the site. Moreover, in contrast to mammals, editing sites were weakly enriched in interspersed repeats and the frequency and editing ratio of non-synonymous sites were higher than those of synonymous sites.Interestingly, we found several tissue-specific edited genes including GABRA3, SORL1 and HTR1D in brain and RYR2 and FHOD3 in heart that were associated with functional processes relevant to the corresponding tissue. This finding highlighted the importance of the RNA editing in several chicken tissues, especially the brain. This study extends our understanding of RNA editing in chicken tissues and establish a foundation for further exploration of this process.
- Published
- 2018
- Full Text
- View/download PDF
46. SQuIRE: Software for Quantifying Interspersed Repeat Elements
- Author
-
Lindsay M. Payer, Pacyna Cn, Daniel Ardeljan, Kathleen H. Burns, and Wan Rou Yang
- Subjects
0303 health sciences ,business.industry ,Computer science ,Interspersed repeat ,Repetitive Sequences ,Retrotransposon ,Computational biology ,Genome project ,03 medical and health sciences ,0302 clinical medicine ,Software ,Squire ,Consensus sequence ,Human genome ,business ,030217 neurology & neurosurgery ,030304 developmental biology - Abstract
Transposable elements are interspersed repeat sequences that make up much of the human genome. Conventional approaches to RNA-seq analysis often exclude these sequences, fail to optimally adjudicate read alignments, or align reads to interspersed repeat consensus sequences without considering these transcripts in their genomic contexts. As a result, repetitive sequence contributions to transcriptomes are not well understood. Here, we present Software for Quantifying Interspersed Repeat Expression (SQuIRE), an RNA-seq analysis pipeline that integrates repeat and genome annotation (RepeatMasker), read alignment (STAR), gene expression (StringTie) and differential expression (DESeq2). SQuIRE uniquely provides a locus-specific picture of interspersed repeat-encoded RNA expression. SQuIRE can be downloaded at (github.com/wyang17/SQuIRE).
- Published
- 2018
- Full Text
- View/download PDF
47. TranSurVeyor: an improved database-free algorithm for finding non-reference transpositions in high-throughput sequencing data
- Author
-
Ramesh Rajaby and Wing-Kin Sung
- Subjects
0301 basic medicine ,Databases, Factual ,Interspersed repeat ,Transposition (telecommunications) ,Biology ,computer.software_genre ,Genome ,DNA sequencing ,03 medical and health sciences ,Genetics ,Humans ,Cluster analysis ,Database ,Genome, Human ,Computational Biology ,High-Throughput Nucleotide Sequencing ,Reproducibility of Results ,Genomics ,Mutagenesis, Insertional ,030104 developmental biology ,Filter (video) ,DNA Transposable Elements ,Methods Online ,Human genome ,computer ,Algorithms ,Reference genome - Abstract
Transpositions transfer DNA segments between different loci within a genome; in particular, when a transposition is found in a sample but not in a reference genome, it is called a non-reference transposition. They are important structural variations that have clinical impact. Transpositions can be called by analyzing second generation high-throughput sequencing datasets. Current methods follow either a database-based or a database-free approach. Database-based methods require a database of transposable elements. Some of them have good specificity; however this approach cannot detect novel transpositions, and it requires a good database of transposable elements, which is not yet available for many species. Database-free methods perform de novo calling of transpositions, but their accuracy is low. We observe that this is due to the misalignment of the reads; since reads are short and the human genome has many repeats, false alignments create false positive predictions while missing alignments reduce the true positive rate. This paper proposes new techniques to improve database-free non-reference transposition calling: first, we propose a realignment strategy called one-end remapping that corrects the alignments of reads in interspersed repeats; second, we propose a SNV-aware filter that removes some incorrectly aligned reads. By combining these two techniques and other techniques like clustering and positive-to-negative ratio filter, our proposed transposition caller TranSurVeyor shows at least 3.1-fold improvement in terms of F1-score over existing database-free methods. More importantly, even though TranSurVeyor does not use databases of prior information, its performance is at least as good as existing database-based methods such as MELT, Mobster and Retroseq. We also illustrate that TranSurVeyor can discover transpositions that are not known in the current database.
- Published
- 2018
48. GenomeLandscaper: Landscape analysis of genome-fingerprints maps assessing chromosome architecture
- Author
-
Ai Hannan, Fanmei Meng, and Ai Yuncan
- Subjects
0301 basic medicine ,Interspersed repeat ,lcsh:Medicine ,Genomics ,Biology ,Genome ,Article ,03 medical and health sciences ,Tandem repeat ,Phylogenetics ,Chromosome architecture ,Databases, Genetic ,lcsh:Science ,Multidisciplinary ,lcsh:R ,Chromosome Mapping ,Computational Biology ,DNA Fingerprinting ,030104 developmental biology ,DNA profiling ,Evolutionary biology ,Landscape analysis ,lcsh:Q ,Algorithms ,Software - Abstract
Assessing correctness of an assembled chromosome architecture is a central challenge. We create a geometric analysis method (called GenomeLandscaper) to conduct landscape analysis of genome-fingerprints maps (GFM), trace large-scale repetitive regions, and assess their impacts on the global architectures of assembled chromosomes. We develop an alignment-free method for phylogenetics analysis. The human Y chromosomes (GRCh.chrY, HuRef.chrY and YH.chrY) are analysed as a proof-of-concept study. We construct a galaxy of genome-fingerprints maps (GGFM) for them, and a landscape compatibility among relatives is observed. But a long sharp straight line on the GGFM breaks such a landscape compatibility, distinguishing GRCh38p1.chrY (and throughout GRCh38p7.chrY) from GRCh37p13.chrY, HuRef.chrY and YH.chrY. We delete a 1.30-Mbp target segment to rescue the landscape compatibility, matching the antecedent GRCh37p13.chrY. We re-locate it into the modelled centromeric and pericentromeric region of GRCh38p10.chrY, matching a gap placeholder of GRCh37p13.chrY. We decompose it into sub-constituents (such as BACs, interspersed repeats, and tandem repeats) and trace their homologues by phylogenetics analysis. We elucidate that most examined tandem repeats are of reasonable quality, but the BAC-sized repeats, 173U1020C (176.46 Kbp) and 5U41068C (205.34 Kbp), are likely over-repeated. These results offer unique insights into the centromeric and pericentromeric regions of the human Y chromosomes.
- Published
- 2018
49. Repetitive DNA in eukaryotic genomes
- Author
-
Ettore Olmo, Maria Assunta Biscotti, and J. S. Heslop-Harrison
- Subjects
Comparative genomics ,Genetics ,Genome ,Interspersed repeat ,Eukaryota ,Genomics ,DNA ,Genome project ,Biology ,Noncoding DNA ,Evolutionary biology ,Cot analysis ,Human genome ,Repeated sequence ,Repetitive Sequences, Nucleic Acid - Abstract
Repetitive DNA--sequence motifs repeated hundreds or thousands of times in the genome--makes up the major proportion of all the nuclear DNA in most eukaryotic genomes. However, the significance of repetitive DNA in the genome is not completely understood, and it has been considered to have both structural and functional roles, or perhaps even no essential role. High-throughput DNA sequencing reveals huge numbers of repetitive sequences. Most bioinformatic studies focus on low-copy DNA including genes, and hence, the analyses collapse repeats in assemblies presenting only one or a few copies, often masking out and ignoring them in both DNA and RNA read data. Chromosomal studies are proving vital to examine the distribution and evolution of sequences because of the challenges of analysis of sequence data. Many questions are open about the origin, evolutionary mode and functions that repetitive sequences might have in the genome. Some, the satellite DNAs, are present in long arrays of similar motifs at a small number of sites, while others, particularly the transposable elements (DNA transposons and retrotranposons), are dispersed over regions of the genome; in both cases, sequence motifs may be located at relatively specific chromosome domains such as centromeres or subtelomeric regions. Here, we overview a range of works involving detailed characterization of the nature of all types of repetitive sequences, in particular their organization, abundance, chromosome localization, variation in sequence within and between chromosomes, and, importantly, the investigation of their transcription or expression activity. Comparison of the nature and locations of sequences between more, and less, related species is providing extensive information about their evolution and amplification. Some repetitive sequences are extremely well conserved between species, while others are among the most variable, defining differences between even closely relative species. These data suggest contrasting modes of evolution of repetitive DNA of different types, including selfish sequences that propagate themselves and may even be transferred horizontally between species rather than by descent, through to sequences that have a tendency to amplification because of their sequence motifs, to those that have structural significance because of their bulk rather than precise sequence. Functional consequences of repeats include generation of variability by movement and insertion in the genome (giving useful genetic markers), the definition of centromeres, expression under stress conditions and regulation of gene expression via RNA moieties. Molecular cytogenetics and bioinformatic studies in a comparative context are now enabling understanding of the nature and behaviour of this major genomic component.
- Published
- 2015
50. Repetitive DNA sequences in plant genomes
- Author
-
A. B. Shcherban
- Subjects
Genetics ,Variable number tandem repeat ,Tandem repeat ,Cot analysis ,Interspersed repeat ,Animal Science and Zoology ,Human genome ,Biology ,Repeated sequence ,Agronomy and Crop Science ,Genome ,Gene - Abstract
The main classes of repetitive DNA sequences, including coding (rRNA genes) and noncoding (tandem and interspersed repeats) sequences are reviewed. Emphasis is placed on their special role in the formation of the structural and functional organization of the genomes of higher plants and in the support of their higher genetic variation, compared to animal genomes, at the levels of individual sequences and of the whole genome.
- Published
- 2015
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.