Back to Search
Start Over
Alpha-CENTAURI: assessing novel centromeric repeat sequence variation with long read sequencing
- Source :
- Bioinformatics
- Publication Year :
- 2016
- Publisher :
- Oxford University Press, 2016.
-
Abstract
- Motivation: Long arrays of near-identical tandem repeats are a common feature of centromeric and subtelomeric regions in complex genomes. These sequences present a source of repeat structure diversity that is commonly ignored by standard genomic tools. Unlike reads shorter than the underlying repeat structure that rely on indirect inference methods, e.g. assembly, long reads allow direct inference of satellite higher order repeat structure. To automate characterization of local centromeric tandem repeat sequence variation we have designed Alpha-CENTAURI (ALPHA satellite CENTromeric AUtomated Repeat Identification), that takes advantage of Pacific Bioscience long-reads from whole-genome sequencing datasets. By operating on reads prior to assembly, our approach provides a more comprehensive set of repeat-structure variants and is not impacted by rearrangements or sequence underrepresentation due to misassembly. Results: We demonstrate the utility of Alpha-CENTAURI in characterizing repeat structure for alpha satellite containing reads in the hydatidiform mole (CHM1, haploid-like) genome. The pipeline is designed to report local repeat organization summaries for each read, thereby monitoring rearrangements in repeat units, shifts in repeat orientation and sites of array transition into non-satellite DNA, typically defined by transposable element insertion. We validate the method by showing consistency with existing centromere high order repeat references. Alpha-CENTAURI can, in principle, run on any sequence data, offering a method to generate a sequence repeat resolution that could be readily performed using consensus sequences available for other satellite families in genomes without high-quality reference assemblies. Availability and implementation: Documentation and source code for Alpha-CENTAURI are freely available at http://github.com/volkansevim/alpha-CENTAURI. Contact: ali.bashir@mssm.edu Supplementary information: Supplementary data are available at Bioinformatics online.
- Subjects :
- 0301 basic medicine
Statistics and Probability
Source code
media_common.quotation_subject
Centromere
Genomics
Computational biology
Biology
Biochemistry
Genome
Set (abstract data type)
03 medical and health sciences
0302 clinical medicine
Tandem repeat
Pregnancy
Consensus Sequence
Consensus sequence
Humans
Molecular Biology
Sequence (medicine)
media_common
Genetics
Computational Biology
Hydatidiform Mole
Sequence Analysis, DNA
Genome Analysis
Computer Science Applications
Computational Mathematics
Variable number tandem repeat
Discovery Note
030104 developmental biology
Computational Theory and Mathematics
Tandem Repeat Sequences
Female
030217 neurology & neurosurgery
Algorithms
Subjects
Details
- Language :
- English
- ISSN :
- 13674811 and 13674803
- Volume :
- 32
- Issue :
- 13
- Database :
- OpenAIRE
- Journal :
- Bioinformatics
- Accession number :
- edsair.doi.dedup.....5ea88c5eaa631f248389a161235e88fb