43 results on '"Besenbacher, Søren"'
Search Results
2. A method to build extended sequence context models of point mutations and indels
- Author
-
Bethune, Jörn, Kleppe, April, and Besenbacher, Søren
- Published
- 2022
- Full Text
- View/download PDF
3. Expression patterns and prognostic potential of circular RNAs in mantle cell lymphoma: a study of younger patients from the MCL2 and MCL3 clinical trials
- Author
-
Dahl, Mette, Husby, Simon, Eskelund, Christian W., Besenbacher, Søren, Fjelstrup, Søren, Côme, Christophe, Ek, Sara, Kolstad, Arne, Räty, Riikka, Jerkeman, Mats, Geisler, Christian H., Kjems, Jørgen, Kristensen, Lasse S., and Grønbæk, Kirsten
- Published
- 2022
- Full Text
- View/download PDF
4. Prognostic miRNA classifier in early-stage mycosis fungoides: development and validation in a Danish nationwide study
- Author
-
Lindahl, Lise M., Besenbacher, Søren, Rittig, Anne H., Celis, Pamela, Willerslev-Olsen, Andreas, Gjerdrum, Lise M.R., Krejsgaard, Thorbjørn, Johansen, Claus, Litman, Thomas, Woetmann, Anders, Odum, Niels, and Iversen, Lars
- Published
- 2018
- Full Text
- View/download PDF
5. Correction: Expression patterns and prognostic potential of circular RNAs in mantle cell lymphoma: a study of younger patients from the MCL2 and MCL3 clinical trials
- Author
-
Dahl, Mette, Husby, Simon, Eskelund, Christian W., Besenbacher, Søren, Fjelstrup, Søren, Côme, Christophe, Ek, Sara, Kolstad, Arne, Räty, Riikka, Jerkeman, Mats, Geisler, Christian H., Kjems, Jørgen, Kristensen, Lasse S., and Grønbæk, Kirsten
- Published
- 2022
- Full Text
- View/download PDF
6. Direct estimation of mutations in great apes reconciles phylogenetic dating
- Author
-
Besenbacher, Søren, Hvilsom, Christina, Marques-Bonet, Tomas, Mailund, Thomas, and Schierup, Mikkel Heide
- Published
- 2019
- Full Text
- View/download PDF
7. Prediction of Primary Tumors in Cancers of Unknown Primary
- Author
-
Søndergaard Dan, Nielsen Svend, Pedersen Christian N.S., and Besenbacher Søren
- Subjects
cancer of unknown origin ,classification ,transcriptomics ,precision medicine ,Biotechnology ,TP248.13-248.65 - Abstract
A cancer of unknown primary (CUP) is a metastatic cancer for which standard diagnostic tests fail to identify the location of the primary tumor. CUPs account for 3–5% of cancer cases. Using molecular data to determine the location of the primary tumor in such cases can help doctors make the right treatment choice and thus improve the clinical outcome. In this paper, we present a new method for predicting the location of the primary tumor using gene expression data: locating cancers of unknown primary (LoCUP). The method models the data as a mixture of normal and tumor cells and thus allows correct classification even in impure samples, where the tumor biopsy is contaminated by a large fraction of normal cells. We find that our method provides a significant increase in classification accuracy (95.8% over 90.8%) on simulated low-purity metastatic samples and shows potential on a small dataset of real metastasis samples with known origin.
- Published
- 2017
- Full Text
- View/download PDF
8. A site specific model and analysis of the neutral somatic mutation rate in whole-genome cancer data
- Author
-
Bertl, Johanna, Guo, Qianyun, Juul, Malene, Besenbacher, Søren, Nielsen, Morten Muhlig, Hornshøj, Henrik, Pedersen, Jakob Skou, and Hobolth, Asger
- Published
- 2018
- Full Text
- View/download PDF
9. Author Correction: Direct estimation of mutations in great apes reconciles phylogenetic dating
- Author
-
Besenbacher, Søren, Hvilsom, Christina, Marques-Bonet, Tomas, Mailund, Thomas, and Schierup, Mikkel Heide
- Published
- 2019
- Full Text
- View/download PDF
10. The Proteome of Seed Development in the Model Legume Lotus japonicus
- Author
-
Dam, Svend, Laursen, Brian S., Ørnfelt, Jane H., Jochimsen, Bjarne, Staerfeldt, Hans Henrik, Friis, Carsten, Nielsen, Kasper, Goffard, Nicolas, Besenbacher, Soren, Krusell, Lene, Sato, Shusei, Tabata, Satoshi, Thogersen, Ida B., Enghild, Jan J., and Stougaard, Jens
- Published
- 2009
- Full Text
- View/download PDF
11. Unsupervised detection of fragment length signatures of circulating tumor DNA using non-negative matrix factorization.
- Author
-
Renaud, Gabriel, Nørgaard, Maibritt, Lindberg, Johan, Grönberg, Henrik, Laere, Bram De, Jensen, Jørgen Bjerggaard, Borre, Michael, Andersen, Claus Lindbjerg, Sørensen, Karina Dalsgaard, Maretty, Lasse, and Besenbacher, Søren
- Published
- 2022
- Full Text
- View/download PDF
12. Novel variation and de novo mutation rates in population-wide de novo assembled Danish trios
- Author
-
Besenbacher, Søren, Liu, Siyang, Izarzugaza, José M. G., Grove, Jakob, Belling, Kirstine, Bork-Jensen, Jette, Huang, Shujia, Als, Thomas D., Li, Shengting, Yadav, Rachita, Rubio-García, Arcadio, Lescai, Francesco, Demontis, Ditte, Rao, Junhua, Ye, Weijian, Mailund, Thomas, Friborg, Rune M., Pedersen, Christian N. S., Xu, Ruiqi, Sun, Jihua, Liu, Hao, Wang, Ou, Cheng, Xiaofang, Flores, David, Rydza, Emil, Rapacki, Kristoffer, Damm Sørensen, John, Chmura, Piotr, Westergaard, David, Dworzynski, Piotr, Sørensen, Thorkild I. A., Lund, Ole, Hansen, Torben, Xu, Xun, Li, Ning, Bolund, Lars, Pedersen, Oluf, Eiberg, Hans, Krogh, Anders, Børglum, Anders D., Brunak, Søren, Kristiansen, Karsten, Schierup, Mikkel H., Wang, Jun, Gupta, Ramneek, Villesen, Palle, and Rasmussen, Simon
- Published
- 2015
- Full Text
- View/download PDF
13. A fast algorithm for genome-wide haplotype pattern mining
- Author
-
Pedersen Christian NS, Besenbacher Søren, and Mailund Thomas
- Subjects
Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background Identifying the genetic components of common diseases has long been an important area of research. Recently, genotyping technology has reached the level where it is cost effective to genotype single nucleotide polymorphism (SNP) markers covering the entire genome, in thousands of individuals, and analyse such data for markers associated with a diseases. The statistical power to detect association, however, is limited when markers are analysed one at a time. This can be alleviated by considering multiple markers simultaneously. The Haplotype Pattern Mining (HPM) method is a machine learning approach to do exactly this. Results We present a new, faster algorithm for the HPM method. The new approach use patterns of haplotype diversity in the genome: locally in the genome, the number of observed haplotypes is much smaller than the total number of possible haplotypes. We show that the new approach speeds up the HPM method with a factor of 2 on a genome-wide dataset with 5009 individuals typed in 491208 markers using default parameters and more if the pattern length is increased. Conclusion The new algorithm speeds up the HPM method and we show that it is feasible to apply HPM to whole genome association mapping with thousands of individuals and hundreds of thousands of markers.
- Published
- 2009
- Full Text
- View/download PDF
14. The Mutationathon highlights the importance of reaching standardization in estimates of pedigree-based germline mutation rates.
- Author
-
Bergeron, Lucie A., Besenbacher, Søren, Turner, Tychele, Versoza, Cyril J., Wang, Richard J., Price, Alivia Lee, Armstrong, Ellie, Riera, Meritxell, Carlson, Jedidiah, Hwei-yen Chen, Hahn, Matthew W., Harris, Kelley, Kleppe, April Snøfrid, López-Nandam, Elora H., Moorjani, Priya, Pfeifer, Susanne P., Tiley, George P., Yoder, Anne D., Guojie Zhang, and Schierup, Mikkel H.
- Subjects
- *
GERM cells , *MACAQUES , *GENETIC mutation , *RHESUS monkeys , *STANDARDIZATION - Abstract
In the past decade, several studies have estimated the human per-generation germline mutation rate using large pedigrees. More recently, estimates for various nonhuman species have been published. However, methodological differences among studies in detecting germline mutations and estimating mutation rates make direct comparisons difficult. Here, we describe the many different steps involved in estimating pedigree-based mutation rates, including sampling, sequencing, mapping, variant calling, filtering, and appropriately accounting for false-positive and false-negative rates. For each step, we review the different methods and parameter choices that have been used in the recent literature. Additionally, we present the results from a 'Mutationathon,' a competition organized among five research labs to compare germline mutation rate estimates for a single pedigree of rhesus macaques. We report almost a twofold variation in the final estimated rate among groups using different post-alignment processing, calling, and filtering criteria, and provide details into the sources of variation across studies. Though the difference among estimates is not statistically significant, this discrepancy emphasizes the need for standardized methods in mutation rate estimations and the difficulty in comparing rates from different studies. Finally, this work aims to provide guidelines for computational and statistical benchmarks for future studies interested in identifying germline mutations from pedigrees. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
15. Author Correction:Direct estimation of mutations in great apes reconciles phylogenetic dating (Nature Ecology & Evolution, (2019), 3, 2, (286-292), 10.1038/s41559-018-0778-x)
- Author
-
Besenbacher, Søren, Hvilsom, Christina, Marques-Bonet, Tomas, Mailund, Thomas, and Schierup, Mikkel Heide
- Published
- 2019
16. RBT—a tool for building refined Buneman trees
- Author
-
Besenbacher, Søren, Mailund, Thomas, Westh-Nielsen, Lasse, and Pedersen, Christian N. S.
- Published
- 2005
17. Studying mutation rate evolution in primates—a need for systematic comparison of computational pipelines.
- Author
-
Bergeron, Lucie A, Besenbacher, Søren, Schierup, Mikkel H, and Zhang, Guojie
- Subjects
- *
GENETIC mutation , *RHESUS monkeys , *PRIMATES , *GERM cells , *BEST practices - Abstract
The lack of consensus methods to estimate germline mutation rates from pedigrees has led to substantial differences in computational pipelines in the published literature. Here, we answer Susanne Pfeifer's opinion piece discussing the pipeline choices of our recent article estimating the germline mutation rate of rhesus macaques (Macaca mulatta). We acknowledge the differences between the method that we applied and the one preferred by Pfeifer. Yet, we advocate for full transparency and justification of choices as long as rigorous comparison of pipelines remains absent because it is the only way to conclude on best practices for the field. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
18. The germline mutational process in rhesus macaque and its implications for phylogenetic dating.
- Author
-
Bergeron, Lucie A, Besenbacher, Søren, Bakker, Jaco, Zheng, Jiao, Li, Panyi, Pacheco, George, Sinding, Mikkel-Holger S, Kamilari, Maria, Gilbert, M Thomas P, Schierup, Mikkel H, and Zhang, Guojie
- Subjects
- *
RHESUS monkeys , *MOLECULAR clock , *MACAQUES , *GERM cells , *APES , *PRIMATES - Abstract
Background Understanding the rate and pattern of germline mutations is of fundamental importance for understanding evolutionary processes. Results Here we analyzed 19 parent-offspring trios of rhesus macaques (Macaca mulatta) at high sequencing coverage of ∼76× per individual and estimated a mean rate of 0.77 × 10−8 de novo mutations per site per generation (95% CI: 0.69 × 10−8 to 0.85 × 10−8). By phasing 50% of the mutations to parental origins, we found that the mutation rate is positively correlated with the paternal age. The paternal lineage contributed a mean of 81% of the de novo mutations, with a trend of an increasing male contribution for older fathers. Approximately 3.5% of de novo mutations were shared between siblings, with no parental bias, suggesting that they arose from early development (postzygotic) stages. Finally, the divergence times between closely related primates calculated on the basis of the yearly mutation rate of rhesus macaque generally reconcile with divergence estimated with molecular clock methods, except for the Cercopithecoidea/Hominoidea molecular divergence dated at 58 Mya using our new estimate of the yearly mutation rate. Conclusions When compared to the traditional molecular clock methods, new estimated rates from pedigree samples can provide insights into the evolution of well-studied groups such as primates. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
19. Benchmarking the HLA typing performance of Polysolver and Optitype in 50 Danish parental trios
- Author
-
Matey-Hernandez, María Luisa, Maretty, Lasse, Jensen, Jacob Malte, Petersen, Bent, Andreas Sibbesen, Jonas, Liu, Siyang, Villesen, Palle, Skov, Laurits, Belling, Kirstine, Theil Have, Christian, Gonzalez-Izarzugaza, Jose Maria, Grosjean, Marie, Bork-Jensen, Jette, Grove, Jakob, Als, Thomas D., Huang, Shujia, Chang, Yuqi, Xu, Ruiqi, Ye, Weijian, Rao, Junhua, Guo, Xiaosen, Sun, Jihua, Cao, Hongzhi, Ye, Chen, Beusekom, Johan v., Espeseth, Thomas, Flindt, Esben N., Friborg, Rune M., Halager, Anders Egerup, Le Hellard, Stephanie, Hultman, Christina M., Lescai, Francesco, Li, Shengting, Lund, Ole, Løngren, Peter, Mailund, Thomas, Mors, Ole, Pedersen, Christian N. S., Sicheritz-Pontén, Thomas, Sullivan, Patrick F., Ali , Syed, Westergaard, David, Yadav, Rachita, Li, Ning, Xu, Xun, Hansen, Torben, Krogh, Anders, Bolund, Lars, Sørensen, Thorkild I. A., Pedersen, Oluf, Gupta, Ramneek, Besenbacher, Søren, Børglum, Anders D., Wang, Jun, Eiberg, Hans, Kristiansen, Karsten, Brunak, Søren, Schierup, Mikkel Heide, and Izarzugaza, Jose M. G.
- Subjects
0301 basic medicine ,Parents ,Clinical genomics ,Genotyping Techniques ,Population genetics ,Computational biology ,Human leukocyte antigen ,Biology ,lcsh:Computer applications to medicine. Medical informatics ,Biochemistry ,Genome ,Deep sequencing ,03 medical and health sciences ,0302 clinical medicine ,SDG 3 - Good Health and Well-being ,Structural Biology ,HLA Antigens ,Humans ,HLA genotyping ,Family ,Allele ,lcsh:QH301-705.5 ,Molecular Biology ,Allele frequency ,Whole genome sequencing ,Sweden ,Applied Mathematics ,Histocompatibility Testing ,Genomics ,Computer Science Applications ,Benchmarking ,030104 developmental biology ,lcsh:Biology (General) ,030220 oncology & carcinogenesis ,NGS ,lcsh:R858-859.7 ,DNA microarray ,Prediction ,Research Article - Abstract
BACKGROUND: The adaptive immune response intrinsically depends on hypervariable human leukocyte antigen (HLA) genes. Concomitantly, correct HLA phenotyping is crucial for successful donor-patient matching in organ transplantation. The cost and technical limitations of current laboratory techniques, together with advances in next-generation sequencing (NGS) methodologies, have increased the need for precise computational typing methods.RESULTS: We tested two widespread HLA typing methods using high quality full genome sequencing data from 150 individuals in 50 family trios from the Genome Denmark project. First, we computed descendant accuracies assessing the agreement in the inheritance of alleles from parents to offspring. Second, we compared the locus-specific homozygosity rates as well as the allele frequencies; and we compared those to the observed values in related populations. We provide guidelines for testing the accuracy of HLA typing methods by comparing family information, which is independent of the availability of curated alleles.CONCLUSIONS: Although current computational methods for HLA typing generally provide satisfactory results, our benchmark - using data with ultra-high sequencing depth - demonstrates the incompleteness of current reference databases, and highlights the importance of providing genomic databases addressing current sequencing standards, a problem yet to be resolved before benefiting fully from personalised medicine approaches HLA phenotyping is essential.
- Published
- 2018
20. Whole genome association mapping by incompatibilities and local perfect phylogenies
- Author
-
Besenbacher Søren, Mailund Thomas, and Schierup Mikkel H
- Subjects
Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background With current technology, vast amounts of data can be cheaply and efficiently produced in association studies, and to prevent data analysis to become the bottleneck of studies, fast and efficient analysis methods that scale to such data set sizes must be developed. Results We present a fast method for accurate localisation of disease causing variants in high density case-control association mapping experiments with large numbers of cases and controls. The method searches for significant clustering of case chromosomes in the "perfect" phylogenetic tree defined by the largest region around each marker that is compatible with a single phylogenetic tree. This perfect phylogenetic tree is treated as a decision tree for determining disease status, and scored by its accuracy as a decision tree. The rationale for this is that the perfect phylogeny near a disease affecting mutation should provide more information about the affected/unaffected classification than random trees. If regions of compatibility contain few markers, due to e.g. large marker spacing, the algorithm can allow the inclusion of incompatibility markers in order to enlarge the regions prior to estimating their phylogeny. Haplotype data and phased genotype data can be analysed. The power and efficiency of the method is investigated on 1) simulated genotype data under different models of disease determination 2) artificial data sets created from the HapMap ressource, and 3) data sets used for testing of other methods in order to compare with these. Our method has the same accuracy as single marker association (SMA) in the simplest case of a single disease causing mutation and a constant recombination rate. However, when it comes to more complex scenarios of mutation heterogeneity and more complex haplotype structure such as found in the HapMap data our method outperforms SMA as well as other fast, data mining approaches such as HapMiner and Haplotype Pattern Mining (HPM) despite being significantly faster. For unphased genotype data, an initial step of estimating the phase only slightly decreases the power of the method. The method was also found to accurately localise the known susceptibility variants in an empirical data set – the ΔF508 mutation for cystic fibrosis – where the susceptibility variant is already known – and to find significant signals for association between the CYP2D6 gene and poor drug metabolism, although for this dataset the highest association score is about 60 kb from the CYP2D6 gene. Conclusion Our method has been implemented in the Blossoc (BLOck aSSOCiation) software. Using Blossoc, genome wide chip-based surveys of 3 million SNPs in 1000 cases and 1000 controls can be analysed in less than two CPU hours.
- Published
- 2006
- Full Text
- View/download PDF
21. Analysis of 62 hybrid assembled human Y chromosomes exposes rapid structural changes and high rates of gene conversion
- Author
-
Gonzalez-Izarzugaza, Jose Maria, Skov, Laurits, Maretty, Lasse, Jensen, Jacob Malte, Petersen, Bent, Andreas Sibbesen, Jonas, Liu, Siyang, Villesen, Palle, Belling, Kirstine González-Izarzugaza, Theil Have, Christian, Grosjean, Marie, Bork-Jensen, Jette, Grove, Jakob, Als, Thomas D., Huang, Shujia, Chang, Yuqi, Xu, Ruiqi, Ye, Weijian, Rao, Junhua, Guo, Xiaosen, Sun, Jihua, Cao, Hongzhi, Ye, Chen, van Beusekom, Johan, Espeseth, Thomas, Flindt, Esben, Friborg, Rune M., Halager, Anders E., Le Hellard, Stephanie, Hultman, Christina M., Lescai, Francesco, Li, Shengting, Lund, Ole, Løngren, Peter, Mailund, Thomas, Matey-Hernandez, María Luisa, Mors, Ole, Pedersen, Christian N. S., Sicheritz-Pontén, Thomas, Sullivan, Patrick F., Qaswar Ali Shah, Syed, Westergaard, David, Yadav, Rachita, Li, Ning, Xu, Xun, Hansen, Torben, Krogh, Anders, Bolund, Lars, Sørensen, Thorkild I. A., Pedersen, Oluf, Gupta, Ramneek, Rasmussen, Simon, Besenbacher, Søren, Børglum, Anders D., Wang, Jun, Eiberg, Hans, Kristiansen, Karsten, Brunak, Søren, and Schierup, Mikkel Heide
- Subjects
0301 basic medicine ,Male ,Cancer Research ,Inverted repeat ,Denmark ,Biochemistry ,Haplogroup ,Fathers ,0302 clinical medicine ,INDEL Mutation ,Heterochromatin ,MUTATION ,Genetics (clinical) ,Phylogeny ,POPULATION ,Data Management ,Genetics ,Sex Chromosomes ,Insertion Mutation ,Chromosome Biology ,Phylogenetic Analysis ,Y Chromosomes ,Nucleic acids ,Phylogenetics ,GENOME ,ALIGNMENT ,Deletion Mutation ,Mutation (genetic algorithm) ,Research Article ,EXPRESSION ,Computer and Information Sciences ,lcsh:QH426-470 ,DNA recombination ,Gene Conversion ,Biology ,Y chromosome ,Polymorphism, Single Nucleotide ,SEQUENCE ,Chromosomes ,Nuclear Family ,Evolution, Molecular ,03 medical and health sciences ,Humans ,Evolutionary Systematics ,Gene conversion ,Insertion ,Indel ,Molecular Biology ,Ecology, Evolution, Behavior and Systematics ,Infertility, Male ,Taxonomy ,COPY NUMBER VARIATION ,Evolutionary Biology ,Chromosomes, Human, Y ,Population Biology ,MALE-INFERTILITY ,Inverted Repeat Sequences ,Biology and Life Sciences ,Cell Biology ,DNA ,POLYMORPHISM ,EVOLUTION ,lcsh:Genetics ,030104 developmental biology ,Evolutionary biology ,Mutation ,Haplogroups ,030217 neurology & neurosurgery ,Population Genetics ,Reference genome - Abstract
The human Y-chromosome does not recombine across its male-specific part and is therefore an excellent marker of human migrations. It also plays an important role in male fertility. However, its evolution is difficult to fully understand because of repetitive sequences, inverted repeats and the potentially large role of gene conversion. Here we perform an evolutionary analysis of 62 Y-chromosomes of Danish descent sequenced using a wide range of library insert sizes and high coverage, thus allowing large regions of these chromosomes to be well assembled. These include 17 father-son pairs, which we use to validate variation calling. Using a recent method that can integrate variants based on both mapping and de novo assembly, we genotype 10898 SNVs and 2903 indels (max length of 27241 bp) in our sample and show by father-son concordance and experimental validation that the non-recurrent SNP and indel variation on the Y chromosome tree is called very accurately. This includes variation called in a 0.9 Mb centromeric heterochromatic region, which is by far the most variable in the Y chromosome. Among the variation is also longer sequence-stretches not present in the reference genome but shared with the chimpanzee Y chromosome. We analyzed 2.7 Mb of large inverted repeats (palindromes) for variation patterns among the two palindrome arms and identified 603 mutation and 416 gene conversions events. We find clear evidence for GC-biased gene conversion in the palindromes (and a balancing AT mutation bias), but irrespective of this, also a strong bias towards gene conversion towards the ancestral state, suggesting that palindromic gene conversion may alleviate Muller’s ratchet. Finally, we also find a large number of large-scale gene duplications and deletions in the palindromic regions (at least 24) and find that such events can consist of complex combinations of simultaneous insertions and deletions of long stretches of the Y chromosome., Author summary The Y chromosome is extraordinary in many respects; it is non-recombining along most of its length, it carries many testis-expressed genes that are often found in palindromes and thus in several copies, and it is generally highly repetitive with very few unique genes. Its evolutionary process is not well understood in general because short-read mapping in such complex sequence is difficult. We combine de novo assembly and mapping to investigate evolution in more than 60% of the length of 62 Y chromosomes of Danish descent. We find that Y chromosome evolution is very dynamic even among the set of closely related Y chromosomes in Denmark with many cases of complex duplications and deletions of large regions including whole genes, clear evidence of GC-biased gene conversion in the palindromes and a tendency for gene conversion to revert mutations to their ancestral state.
- Published
- 2017
22. A Site Specific Model And Analysis Of The Neutral Somatic Mutation Rate In Whole-Genome Cancer Data
- Author
-
Bertl, Johanna, Guo, Qianyun, Rasmussen, Malene Juul, Besenbacher, Søren, Nielsen, Morten Muhlig, Hornshøj, Henrik, Pedersen, Jakob Skou, and Hobolth, Asger
- Abstract
Detailed modelling of the neutral mutational process in cancer cells is crucial for identifying driver mutations and understanding the mutational mechanisms that act during cancer development. The neutral mutational process is very complex: whole-genome analyses have revealed that the mutation rate differs between cancer types, between patients and along the genome depending on the genetic and epigenetic context. Therefore, methods that predict the number of different types of mutations in regions or specific genomic elements must consider local genomic explanatory variables. A major drawback of most methods is the need to average the explanatory variables across the entire region or genomic element. This procedure is particularly problematic if the explanatory variable varies dramatically in the element under consideration.To take into account the fine scale of the explanatory variables, we model the probabilities of different types of mutations for each position in the genome by multinomial logistic regression. We analyse 505 cancer genomes from 14 different cancer types and compare the performance in predicting mutation rate for both regional based models and site-specific models. We show that for 1000 randomly selected genomic positions, the site-specific model predicts the mutation rate much better than regional based models. We use a forward selection procedure to identify the most important explanatory variables. The procedure identifies site-specific conservation (phyloP), replication timing, and expression level as the best predictors for the mutation rate. Finally, our model confirms and quantifies certain well-known mutational signatures.We find that our site-specific multinomial regression model outperforms the regional based models. The possibility of including genomic variables on different scales and patient specific variables makes it a versatile framework for studying different mutational mechanisms. Our model can serve as the neutral null model for the mutational process; regions that deviate from the null model are candidates for elements that drive cancer development.
- Published
- 2017
23. Discovery, genotyping and characterization of structural variation and novel sequence at single nucleotide resolution from de novo genome assemblies on a population scale
- Author
-
Liu, Siyang, Huang, Shujia, Rao, Junhua, Ye, Weijian, Schierup, Mikkel H., Villesen, Palle, Xu, Xun, Li, Ning, Kristiansen, Karsten, Sørensen, Thorkild I. A., Hansen, Torben, Pedersen, Oluf, Brunak, Søren, Gupta, Ramneek, Rasmussen, Simon, Lund, Ole, Bolund, Lars, Børglum, Anders D., Eiberg, Hans, Nørgaard Flindt, Esben, Xu, Ruiqi, Sun, Jihua, Liu, Hao, Jiang, Hui, Wang, Ou, Cheng, Xiaofang, Demontis, Ditte, Besenbacher, Søren, Mailund, Thomas, Friborg, Rune M., Pedersen, Christian N. S., Chang, Yuqi, Li, Shengting, Guo, Xiaosen, Cao, Hongzhi, Ye, Chen, Maretty, Lasse, Andreas Sibbesen, Jonas, Albrechtsen, Anders, Bork-Jensen, Jette, Theil Have, Christian, Gonzalez-Izarzugaza, Jose Maria, Belling, Kirstine González-Izarzugaza, Yadav, Rachita, Grove, Jakob, Dam-Als, Thomas, Lescai, Francesco, Krogh, Anders, and Wang, Jun
- Subjects
Novel sequence ,Genotype ,Sequence analysis ,Population ,Sequence assembly ,Health Informatics ,Single-nucleotide polymorphism ,Genomics ,Computational biology ,Biology ,de novo assembly ,Genome ,Structural variation ,SDG 3 - Good Health and Well-being ,Technical Note ,De novo assembly ,Humans ,education ,Genetics ,education.field_of_study ,Genome, Human ,Genetic Variation ,High-Throughput Nucleotide Sequencing ,Computer Science Applications ,Human genome ,Sequence Analysis ,Software - Abstract
Background Comprehensive recognition of genomic variation in one individual is important for understanding disease and developing personalized medication and treatment. Many tools based on DNA re-sequencing exist for identification of single nucleotide polymorphisms, small insertions and deletions (indels) as well as large deletions. However, these approaches consistently display a substantial bias against the recovery of complex structural variants and novel sequence in individual genomes and do not provide interpretation information such as the annotation of ancestral state and formation mechanism. Findings We present a novel approach implemented in a single software package, AsmVar, to discover, genotype and characterize different forms of structural variation and novel sequence from population-scale de novo genome assemblies up to nucleotide resolution. Application of AsmVar to several human de novo genome assemblies captures a wide spectrum of structural variants and novel sequences present in the human population in high sensitivity and specificity. Conclusions Our method provides a direct solution for investigating structural variants and novel sequences from de novo genome assemblies, facilitating the construction of population-scale pan-genomes. Our study also highlights the usefulness of the de novo assembly strategy for definition of genome structure. Electronic supplementary material The online version of this article (doi:10.1186/s13742-015-0103-4) contains supplementary material, which is available to authorized users.
- Published
- 2015
24. Association mapping and disease:evolutionary perspectives
- Author
-
Besenbacher, Søren, Mailund, Thomas, and Schierup, Mikkel H
- Abstract
In this chapter, we give a short introduction to the genetics of complex disease with special emphasis on evolutionary models for disease genes and the effect of different models on the genetic architecture, and finally give a survey of the state-of-the-art of genome-wide association studies.
- Published
- 2012
25. A fast algorithm for genome wide haplotype pattern mining
- Author
-
Besenbacher, Søren, Mailund, Thomas, and Pedersen, Christian Storm
- Published
- 2009
26. Challenges in whole-genome association mapping
- Author
-
Besenbacher, Søren
- Published
- 2008
27. Pathway Analysis of Skin from Psoriasis Patients after Adalimumab Treatment Reveals New Early Events in the Anti-Inflammatory Mechanism of Anti-TNF-α.
- Author
-
Langkilde, Ane, Olsen, Lene C., Sætrom, Pål, Drabløs, Finn, Besenbacher, Søren, Raaby, Line, Johansen, Claus, and Iversen, Lars
- Subjects
ADALIMUMAB ,PSORIASIS treatment ,PSORIASIS ,ANTI-inflammatory agents ,DENDRITIC cells ,PATIENTS ,THERAPEUTICS - Abstract
Psoriasis is a chronic cutaneous inflammatory disease. The immunopathogenesis is a complex interplay between T cells, dendritic cells and the epidermis in which T cells and dendritic cells maintain skin inflammation. Anti-tumour necrosis factor (anti-TNF)-α agents have been approved for therapeutic use across a range of inflammatory disorders including psoriasis, but the anti-inflammatory mechanisms of anti-TNF-α in lesional psoriatic skin are not fully understood. We investigated early events in skin from psoriasis patients after treatment with anti-TNF-α antibodies by use of bioinformatics tools. We used the Human Gene 1.0 ST Array to analyse gene expression in punch biopsies taken from psoriatic patients before and also 4 and 14 days after initiation of treatment with the anti-TNF-α agent adalimumab. The gene expression was analysed by gene set enrichment analysis using the Functional Annotation Tool from DAVID Bioinformatics Resources. The most enriched pathway was visualised by the Pathview Package on Kyoto Encyclopedia of Genes and Genomes (KEGG) graphs. The analysis revealed new very early events in psoriasis after adalimumab treatment. Some of these events have been described after longer periods of anti-TNF-α treatment when clinical and histological changes appear, suggesting that effects of anti-TNF-α treatment on gene expression appear very early before clinical and histological changes. Combining microarray data on biopsies from psoriasis patients with pathway analysis allowed us to integrate in vitro findings into the identification of mechanisms that may be important in vivo. Furthermore, these results may reflect primary effect of anti-TNF-α treatment in contrast to studies of gene expression changes following clinical and histological changes, which may reflect secondary changes correlated to the healing of the skin. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
28. Multi-nucleotide de novo Mutations in Humans.
- Author
-
Besenbacher, Søren, Sulem, Patrick, Helgason, Agnar, Helgason, Hannes, Kristjansson, Helgi, Jonasdottir, Aslaug, Jonasdottir, Adalbjorg, Magnusson, Olafur Th., Thorsteinsdottir, Unnur, Masson, Gisli, Kong, Augustine, Gudbjartsson, Daniel F., and Stefansson, Kari
- Subjects
- *
GENETIC mutation , *DNA , *SINGLE nucleotide polymorphisms , *GENETIC recombination , *HUMAN beings - Abstract
Mutation of the DNA molecule is one of the most fundamental processes in biology. In this study, we use 283 parent-offspring trios to estimate the rate of mutation for both single nucleotide variants (SNVs) and short length variants (indels) in humans and examine the mutation process. We found 17812 SNVs, corresponding to a mutation rate of 1.29 × 10−8 per position per generation (PPPG) and 1282 indels corresponding to a rate of 9.29 × 10−10 PPPG. We estimate that around 3% of human de novo SNVs are part of a multi-nucleotide mutation (MNM), with 558 (3.1%) of mutations positioned less than 20kb from another mutation in the same individual (median distance of 525bp). The rate of de novo mutations is greater in late replicating regions (p = 8.29 × 10−19) and nearer recombination events (p = 0.0038) than elsewhere in the genome. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
29. RBT - A Tool for Building Refined Buneman Trees
- Author
-
Besenbacher, Søren, Mailund, Thomas, Westh-Nielsen, L., and Pedersen, Christian Storm
- Abstract
Udgivelsesdato: December 2004
- Published
- 2004
30. Association Mapping and Disease: Evolutionary Perspectives.
- Author
-
Besenbacher, Søren, Mailund, Thomas, and Schierup, Mikkel H.
- Published
- 2012
- Full Text
- View/download PDF
31. Indexing and Searching a Mass Spectrometry Database.
- Author
-
Besenbacher, Søren, Schwikowski, Benno, and Stoye, Jens
- Abstract
Database preprocessing in order to create an index often permits considerable speedup in search compared to the iterated query of an unprocessed database. In this paper we apply index-based database lookup to a range search problem that arises in mass spectrometry-based proteomics: given a large collection of sparse integer sets and a sparse query set, find all the sets from the collection that have at least k integers in common with the query set. This problem arises when searching for a mass spectrum in a database of theoretical mass spectra using the shared peaks count as similarity measure. The algorithms can easily be modified to use the more advanced shared peaks intensity measure instead of the shared peaks count. We introduce three different algorithms solving these problems. We conclude by presenting some experiments using the algorithms on realistic data showing the advantages and disadvantages of the algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
32. Identifying disease associated genes by network propagation.
- Author
-
Yu Qian, Besenbacher, Søren, Mailund, Thomas, and Schierup, Mikkel Heide
- Abstract
Background: Genome-wide association studies have identified many individual genes associated with complex traits. However, pathway and network information have not been fully exploited in searches for genetic determinants, and including this information may increase our understanding of the underlying biology of common diseases. Results: In this study, we propose a framework to address this problem in a principled way, with the underlying hypothesis that complex disease operates through multiple connected genes. Associations inferred from GWAS are translated into prior scores for vertices in a protein-protein interaction network, and these scores are propagated through the network. Permutation is used to select genes that are guilty-by-association and thus consistently obtain high scores after network propagation. We apply the approach to data of Crohn’s disease and call candidate genes that have been reported by other independent GWAS, but not in the analysed data set. A prediction model based on these candidate genes show good predictive power as measured by Area Under the Receiver Operating Curve (AUC) in 10 fold cross-validations. Conclusions: Our network propagation method applied to a genome-wide association study increases association findings over other approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
33. Local Phylogeny Mapping of Quantitative Traits: Higher Accuracy and Better Ranking Than Single-Marker Association in Genomewide Scans.
- Author
-
Besenbacher, Søren, Mailund, Thomas, and Schierup, Mikkel H.
- Subjects
- *
PLANT phylogeny , *PHYLOGENY , *GENE mapping , *BIOLOGICAL variation , *PLANT genetics , *GENETIC markers , *BIOMARKERS , *SIMULATION methods & models - Abstract
We present a new method, termed QBlossoc, for linkage disequilibrium (LD) mapping of genetic variants underlying a quantitative trait. The method uses principles similar to a previously published method, Blossoc, for LD mapping of case/control studies. The method builds local genealogies along the genome and looks for a significant clustering of quantitative trait values in these trees. We analyze its efficiency in terms of localization and ranking of true positives among a large number of negatives and compare the results with single-marker approaches. Simulation results of markers at densities comparable to contemporary genotype chips show that QBlossoc is more accurate in localization of true positives as expected since it uses the additional information of LD between markers simultaneously. More importantly, however, for genomewide surveys, QBlossoc places regions with true positives higher on a ranked list than single-marker approaches, again suggesting that a true signal displays itself more strongly in a set of adjacent markers than a spurious (false) signal. The method is both memory and central processing unit (CPU) efficient. It has been tested on a real data set of height data for 5000 individuals measured at ~317,000 markers and completed analysis within 5 CPU days. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
34. Common variants at 19p13 are associated with susceptibility to ovarian cancer
- Author
-
Bolton, Kelly L, Tyrer, Jonathan, Song, Honglin, Ramus, Susan J, Notaridou, Maria, Jones, Chris, Sher, Tanya, Gentry-Maharaj, Aleksandra, Wozniak, Eva, Tsai, Ya-Yu, Weidhaas, Joanne, Paik, Daniel, Van Den Berg, David J, Stram, Daniel O, Pearce, Celeste Leigh, Wu, Anna H, Brewster, Wendy, Anton-Culver, Hoda, Ziogas, Argyrios, Narod, Steven A, Levine, Douglas A, Kaye, Stanley B, Brown, Robert, Paul, Jim, Flanagan, James, Sieh, Weiva, McGuire, Valerie, Whittemore, Alice S, Campbell, Ian, Gore, Martin E, Lissowska, Jolanta, Yang, Hanna P, Medrek, Krzysztof, Gronwald, Jacek, Lubinski, Jan, Jakubowska, Anna, Le, Nhu D, Cook, Linda S, Kelemen, Linda E, Brook-Wilson, Angela, Massuger, Leon F A G, Kiemeney, Lambertus A, Aben, Katja K H, van Altena, Anne M, Houlston, Richard, Tomlinson, Ian, Palmieri, Rachel T, Moorman, Patricia G, Schildkraut, Joellen, Iversen, Edwin S, Phelan, Catherine, Vierkant, Robert A, Cunningham, Julie M, Goode, Ellen L, Fridley, Brooke L, Kruger-Kjaer, Susan, Blaeker, Jan, Hogdall, Estrid, Hogdall, Claus, Gross, Jenny, Karlan, Beth Y, Ness, Roberta B, Edwards, Robert P, Odunsi, Kunle, Moyisch, Kirsten B, Baker, Julie A, Modugno, Francesmary, Heikkinenen, Tuomas, Butzow, Ralf, Nevanlinna, Heli, Leminen, Arto, Bogdanova, Natalia, Antonenkova, Natalia, Doerk, Thilo, Hillemanns, Peter, Dürst, Matthias, Runnebaum, Ingo, Thompson, Pamela J, Carney, Michael E, Goodman, Marc T, Lurie, Galina, Wang-Gohrke, Shan, Hein, Rebecca, Chang-Claude, Jenny, Rossing, Mary Anne, Cushing-Haugen, Kara L, Doherty, Jennifer, Chen, Chu, Rafnar, Thorunn, Besenbacher, Soren, Sulem, Patrick, Stefansson, Kari, Birrer, Michael James, Terry, Kathryn Lynne, Hernandez, Dena, Cramer, Daniel William, Vergote, Ignace, Amant, Frederic, Lambrechts, Diether, Despierre, Evelyn, Fasching, Peter A, Beckmann, Matthias W, Thiel, Falk C, Ekici, Arif B, Chen, Xiaoqing, Johnatty, Sharon E, Webb, Penelope M, Beesley, Jonathan, Chanock, Stephen, Garcia-Closas, Montserrat, Sellers, Tom, Easton, Douglas F, Berchuck, Andrew, Chenevix-Trench, Georgia, Pharoah, Paul D P, and Gayther, Simon A
- Abstract
Epithelial ovarian cancer (EOC) is the leading cause of death from gynecological malignancy in the developed world accounting for 4 percent of deaths from cancer in women1. We performed a three-phase genome-wide association study of EOC survival in 8,951 EOC cases with available survival time data, and a parallel association analysis of EOC susceptibility. Two SNPs at 19p13.11, rs8170 and rs2363956, showed evidence of association with survival (overall P=5×10−4 and 6×10−4), but did not replicate in phase 3. However, the same two SNPs demonstrated genome-wide significance for risk of serous EOC (P=3×10−9 and 4×10−11 respectively). Expression analysis of candidate genes at this locus in ovarian tumors supported a role for the BRCA1 interacting gene C19orf62, also known as MERIT40, which contains rs8170, in EOC development.
- Published
- 2010
- Full Text
- View/download PDF
35. Hierarchical Classification of Cancers of Unknown Primary Using Multi-Omics Data.
- Author
-
Bavafaye Haghighi, Elham, Knudsen, Michael, Elmedal Laursen, Britt, and Besenbacher, Søren
- Subjects
CANCER of unknown primary origin ,TUMOR classification ,HIERARCHICAL clustering (Cluster analysis) ,MISSING data (Statistics) ,SOMATIC mutation ,METASTASIS - Abstract
A cancer of unknown primary (CUP) is a metastatic cancer for which standard diagnostic tests fail to locate the primary cancer. As standard treatments are based on the cancer type, such cases are hard to treat and have very poor prognosis. Using molecular data from the metastatic cancer to predict the primary site can make treatment choice easier and enable targeted therapy. In this article, we first examine the ability to predict cancer type using different types of omics data. Methylation data lead to slightly better prediction than gene expression and both these are superior to classification using somatic mutations. After using 3 data types independently, we notice some differences between the classes that tend to be misclassified, suggesting that integrating the data might improve accuracy. In light of the different levels of information provided by different omics types and to be able to handle missing data, we perform multi-omics classification by hierarchically combining the classifiers. The proposed hierarchical method first classifies based on the most informative type of omics data and then uses the other types of omics data to classify samples that did not get a high confidence classification in the first step. The resulting hierarchical classifier has higher accuracy than any of the single omics classifiers and thus proves that the combination of different data types is beneficial. Our results show that using multi-omics data can improve the classification of cancer types. We confirm this by testing our method on metastatic cancers from the MET500 dataset. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
36. Association Mapping and Disease: Evolutionary Perspectives.
- Author
-
Besenbacher S, Mailund T, Vilhjálmsson BJ, and Schierup MH
- Subjects
- Alleles, Computational Biology methods, Confounding Factors, Epidemiologic, Evolution, Molecular, Gene Frequency, Humans, Models, Genetic, Models, Statistical, Chromosome Mapping, Genetic Predisposition to Disease, Genetic Variation, Genome-Wide Association Study
- Abstract
In this chapter, we give a short introduction to the genetics of complex diseases emphasizing evolutionary models for disease genes and the effect of different models on the genetic architecture, and we give a survey of the state-of-the-art of genome-wide association studies (GWASs).
- Published
- 2019
- Full Text
- View/download PDF
37. Proteomic profiling identifies outcome-predictive markers in patients with peripheral T-cell lymphoma, not otherwise specified.
- Author
-
Ludvigsen M, Bjerregård Pedersen M, Lystlund Lauridsen K, Svenstrup Poulsen T, Hamilton-Dutoit SJ, Besenbacher S, Bendix K, Møller MB, Nørgaard P, d'Amore F, and Honoré B
- Subjects
- Aldehyde Dehydrogenase, Mitochondrial metabolism, Biomarkers, Tumor metabolism, Chromatography, Liquid, Computational Biology, DNA-Binding Proteins metabolism, Female, Humans, Lymphoid Tissue metabolism, Lymphoma, T-Cell, Peripheral genetics, Male, Phosphopyruvate Hydratase metabolism, Prognosis, Tandem Mass Spectrometry, Tumor Suppressor Proteins metabolism, Lymphoma, T-Cell, Peripheral metabolism, Lymphoma, T-Cell, Peripheral mortality, Proteome, Proteomics methods
- Abstract
Peripheral T-cell lymphoma, not otherwise specified (PTCL-NOS) constitutes a heterogeneous category of lymphomas, which do not fit into any of the specifically defined T-cell lymphoma entities. Both the pathogenesis and tumor biology in PTCL-NOS are poorly understood. Protein expression in pretherapeutic PTCL-NOS tumors was analyzed by proteomics. Differentially expressed proteins were compared in 3 distinct scenarios: (A) PTCL-NOS tumor tissue (n = 18) vs benign lymphoid tissue (n = 8), (B) clusters defined by principal component analysis (PCA), and (C) tumors from patients with chemosensitive vs refractory PTCL-NOS. Selected differentially expressed proteins identified by proteomics were correlated with clinico-pathological features and outcome in a larger cohort of patients with PTCL-NOS (n = 87) by immunohistochemistry (IHC). Most proteins with altered expression were identified comparing PTCL-NOS vs benign lymphoid tissue. PCA of the protein profile defined 3 distinct clusters. All benign samples clustered together, whereas PTCL-NOS tumors separated into 2 clusters with different patient overall survival rates ( P = .001). Differentially expressed proteins reflected large biological diversity among PTCL-NOS, particularly associated with alterations of "immunological" pathways. The 2 PTCL-NOS subclusters defined by PCA showed disturbance of "stress-related" and "protein metabolic" pathways. α-Enolase 1 (ENO1) was found differentially expressed in all 3 analyses, and high intratumoral ENO1 expression evaluated by IHC correlated with poor outcome (hazard ratio, 2.09; 95% confidence interval, 1.17-3.73; P = .013). High expression of triosephosphate isomerase (TPI1) also showed a tendency to correlate with poor survival ( P = .057). In conclusion, proteomic profiling of PTCL-NOS provided evidence of markedly altered protein expression and identified ENO1 as a novel potential prognostic marker., (© 2018 by The American Society of Hematology.)
- Published
- 2018
- Full Text
- View/download PDF
38. Assembly and analysis of 100 full MHC haplotypes from the Danish population.
- Author
-
Jensen JM, Villesen P, Friborg RM, Mailund T, Besenbacher S, and Schierup MH
- Subjects
- Alleles, Chromosome Mapping, Denmark, Haplotypes genetics, Humans, Polymorphism, Single Nucleotide genetics, Genetic Variation genetics, Genetics, Population, Linkage Disequilibrium genetics, Major Histocompatibility Complex genetics
- Abstract
Genes in the major histocompatibility complex (MHC, also known as HLA) play a critical role in the immune response and variation within the extended 4-Mb region shows association with major risks of many diseases. Yet, deciphering the underlying causes of these associations is difficult because the MHC is the most polymorphic region of the genome with a complex linkage disequilibrium structure. Here, we reconstruct full MHC haplotypes from de novo assembled trios without relying on a reference genome and perform evolutionary analyses. We report 100 full MHC haplotypes and call a large set of structural variants in the regions for future use in imputation with GWAS data. We also present the first complete analysis of the recombination landscape in the entire region and show how balancing selection at classical genes have linked effects on the frequency of variants throughout the region., (© 2017 Jensen et al.; Published by Cold Spring Harbor Laboratory Press.)
- Published
- 2017
- Full Text
- View/download PDF
39. Sequencing and de novo assembly of 150 genomes from Denmark as a population reference.
- Author
-
Maretty L, Jensen JM, Petersen B, Sibbesen JA, Liu S, Villesen P, Skov L, Belling K, Theil Have C, Izarzugaza JMG, Grosjean M, Bork-Jensen J, Grove J, Als TD, Huang S, Chang Y, Xu R, Ye W, Rao J, Guo X, Sun J, Cao H, Ye C, van Beusekom J, Espeseth T, Flindt E, Friborg RM, Halager AE, Le Hellard S, Hultman CM, Lescai F, Li S, Lund O, Løngren P, Mailund T, Matey-Hernandez ML, Mors O, Pedersen CNS, Sicheritz-Pontén T, Sullivan P, Syed A, Westergaard D, Yadav R, Li N, Xu X, Hansen T, Krogh A, Bolund L, Sørensen TIA, Pedersen O, Gupta R, Rasmussen S, Besenbacher S, Børglum AD, Wang J, Eiberg H, Kristiansen K, Brunak S, and Schierup MH
- Subjects
- Adult, Alleles, Child, Chromosomes, Human, Y genetics, Denmark, Female, Haplotypes genetics, Humans, Major Histocompatibility Complex genetics, Male, Maternal Age, Mutation Rate, Paternal Age, Point Mutation genetics, Reference Standards, Genetic Variation genetics, Genetics, Population standards, Genome, Human genetics, Genomics standards, Sequence Analysis, DNA standards
- Abstract
Hundreds of thousands of human genomes are now being sequenced to characterize genetic variation and use this information to augment association mapping studies of complex disorders and other phenotypic traits. Genetic variation is identified mainly by mapping short reads to the reference genome or by performing local assembly. However, these approaches are biased against discovery of structural variants and variation in the more complex parts of the genome. Hence, large-scale de novo assembly is needed. Here we show that it is possible to construct excellent de novo assemblies from high-coverage sequencing with mate-pair libraries extending up to 20 kilobases. We report de novo assemblies of 150 individuals (50 trios) from the GenomeDenmark project. The quality of these assemblies is similar to those obtained using the more expensive long-read technology. We use the assemblies to identify a rich set of structural variants including many novel insertions and demonstrate how this variant catalogue enables further deciphering of known association mapping signals. We leverage the assemblies to provide 100 completely resolved major histocompatibility complex haplotypes and to resolve major parts of the Y chromosome. Our study provides a regional reference genome that we expect will improve the power of future association mapping studies and hence pave the way for precision medicine initiatives, which now are being launched in many countries including Denmark.
- Published
- 2017
- Full Text
- View/download PDF
40. Identifying disease associated genes by network propagation.
- Author
-
Qian Y, Besenbacher S, Mailund T, and Schierup MH
- Subjects
- Crohn Disease pathology, Genome-Wide Association Study, Humans, Interleukin-12 metabolism, ROC Curve, Signal Transduction genetics, Computational Biology, Crohn Disease genetics, Crohn Disease metabolism, Protein Interaction Maps
- Abstract
Background: Genome-wide association studies have identified many individual genes associated with complex traits. However, pathway and network information have not been fully exploited in searches for genetic determinants, and including this information may increase our understanding of the underlying biology of common diseases., Results: In this study, we propose a framework to address this problem in a principled way, with the underlying hypothesis that complex disease operates through multiple connected genes. Associations inferred from GWAS are translated into prior scores for vertices in a protein-protein interaction network, and these scores are propagated through the network. Permutation is used to select genes that are guilty-by-association and thus consistently obtain high scores after network propagation. We apply the approach to data of Crohn's disease and call candidate genes that have been reported by other independent GWAS, but not in the analysed data set. A prediction model based on these candidate genes show good predictive power as measured by Area Under the Receiver Operating Curve (AUC) in 10 fold cross-validations., Conclusions: Our network propagation method applied to a genome-wide association study increases association findings over other approaches.
- Published
- 2014
- Full Text
- View/download PDF
41. Association mapping and disease: evolutionary perspectives.
- Author
-
Besenbacher S, Mailund T, and Schierup MH
- Subjects
- Data Interpretation, Statistical, Gene Frequency, Humans, Disease genetics, Evolution, Molecular, Genome-Wide Association Study methods
- Abstract
In this chapter, we give a short introduction to the genetics of complex disease with special emphasis on evolutionary models for disease genes and the effect of different models on the genetic architecture, and finally give a survey of the state-of-the-art of genome-wide association studies.
- Published
- 2012
- Full Text
- View/download PDF
42. A fast algorithm for genome-wide haplotype pattern mining.
- Author
-
Besenbacher S, Pedersen CN, and Mailund T
- Subjects
- Databases, Genetic, Genetic Markers, Genetic Predisposition to Disease, Genetic Variation, Humans, Polymorphism, Single Nucleotide, Algorithms, Computational Biology methods, Genome, Human, Haplotypes genetics
- Abstract
Background: Identifying the genetic components of common diseases has long been an important area of research. Recently, genotyping technology has reached the level where it is cost effective to genotype single nucleotide polymorphism (SNP) markers covering the entire genome, in thousands of individuals, and analyse such data for markers associated with a diseases. The statistical power to detect association, however, is limited when markers are analysed one at a time. This can be alleviated by considering multiple markers simultaneously. The Haplotype Pattern Mining (HPM) method is a machine learning approach to do exactly this., Results: We present a new, faster algorithm for the HPM method. The new approach use patterns of haplotype diversity in the genome: locally in the genome, the number of observed haplotypes is much smaller than the total number of possible haplotypes. We show that the new approach speeds up the HPM method with a factor of 2 on a genome-wide dataset with 5009 individuals typed in 491208 markers using default parameters and more if the pattern length is increased., Conclusion: The new algorithm speeds up the HPM method and we show that it is feasible to apply HPM to whole genome association mapping with thousands of individuals and hundreds of thousands of markers.
- Published
- 2009
- Full Text
- View/download PDF
43. Whole genome association mapping by incompatibilities and local perfect phylogenies.
- Author
-
Mailund T, Besenbacher S, and Schierup MH
- Subjects
- Cystic Fibrosis diagnosis, Humans, Phylogeny, Polymorphism, Single Nucleotide genetics, Chromosome Mapping methods, Cystic Fibrosis genetics, Cytochrome P-450 CYP2D6 genetics, DNA Mutational Analysis methods, Genetic Predisposition to Disease genetics, Linkage Disequilibrium genetics
- Abstract
Background: With current technology, vast amounts of data can be cheaply and efficiently produced in association studies, and to prevent data analysis to become the bottleneck of studies, fast and efficient analysis methods that scale to such data set sizes must be developed., Results: We present a fast method for accurate localisation of disease causing variants in high density case-control association mapping experiments with large numbers of cases and controls. The method searches for significant clustering of case chromosomes in the "perfect" phylogenetic tree defined by the largest region around each marker that is compatible with a single phylogenetic tree. This perfect phylogenetic tree is treated as a decision tree for determining disease status, and scored by its accuracy as a decision tree. The rationale for this is that the perfect phylogeny near a disease affecting mutation should provide more information about the affected/unaffected classification than random trees. If regions of compatibility contain few markers, due to e.g. large marker spacing, the algorithm can allow the inclusion of incompatibility markers in order to enlarge the regions prior to estimating their phylogeny. Haplotype data and phased genotype data can be analysed. The power and efficiency of the method is investigated on 1) simulated genotype data under different models of disease determination 2) artificial data sets created from the HapMap ressource, and 3) data sets used for testing of other methods in order to compare with these. Our method has the same accuracy as single marker association (SMA) in the simplest case of a single disease causing mutation and a constant recombination rate. However, when it comes to more complex scenarios of mutation heterogeneity and more complex haplotype structure such as found in the HapMap data our method outperforms SMA as well as other fast, data mining approaches such as HapMiner and Haplotype Pattern Mining (HPM) despite being significantly faster. For unphased genotype data, an initial step of estimating the phase only slightly decreases the power of the method. The method was also found to accurately localise the known susceptibility variants in an empirical data set--the DeltaF508 mutation for cystic fibrosis--where the susceptibility variant is already known--and to find significant signals for association between the CYP2D6 gene and poor drug metabolism, although for this dataset the highest association score is about 60 kb from the CYP2D6 gene., Conclusion: Our method has been implemented in the Blossoc (BLOck aSSOCiation) software. Using Blossoc, genome wide chip-based surveys of 3 million SNPs in 1000 cases and 1000 controls can be analysed in less than two CPU hours.
- Published
- 2006
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.