180 results on '"Kostka, Dennis"'
Search Results
2. A cis-regulatory module underlies retinal ganglion cell genesis and axonogenesis
- Author
-
Mehta, Kamakshi, Daghsni, Marwa, Raeisossadati, Reza, Xu, Zhongli, Davis, Emily, Naidich, Abigail, Wang, Bingjie, Tao, Shiyue, Pi, Shaohua, Chen, Wei, Kostka, Dennis, Liu, Silvia, Gross, Jeffrey M., Kuwajima, Takaaki, and Aldiri, Issam
- Published
- 2024
- Full Text
- View/download PDF
3. FSTL-1 Attenuation Causes Spontaneous Smoke-Resistant Pulmonary Emphysema
- Author
-
Henkel, Matthew, Partyka, Jessica, Gregory, Alyssa D, Forno, Erick, Cho, Michael H, Eddens, Taylor, Tout, Andrew R, Salamacha, Nathan, Horne, William, Rao, Krithika S, Wu, Yijen, Alcorn, John F, Kostka, Dennis, Hirsch, Raphael, Celedón, Juan C, Shapiro, Steven D, Kolls, Jay K, and Campfield, Brian T
- Subjects
Biomedical and Clinical Sciences ,Cardiovascular Medicine and Haematology ,Clinical Sciences ,Emphysema ,Biotechnology ,Lung ,Genetics ,Chronic Obstructive Pulmonary Disease ,1.1 Normal biological development and functioning ,Underpinning research ,Aetiology ,2.1 Biological and endogenous factors ,Respiratory ,Animals ,Endothelial Cells ,Follistatin-Related Proteins ,Gene Expression Regulation ,Gene Knockdown Techniques ,Humans ,In Vitro Techniques ,Macrophages ,Mice ,Mutation ,Nuclear Receptor Subfamily 4 ,Group A ,Member 1 ,Phosphorylation ,Polymorphism ,Single Nucleotide ,Positron Emission Tomography Computed Tomography ,Pulmonary Disease ,Chronic Obstructive ,Pulmonary Emphysema ,Single Photon Emission Computed Tomography Computed Tomography ,Smoke ,Tobacco ,Transcription Factor RelA ,X-Ray Microtomography ,SNP ,chronic obstructive pulmonary disease ,gene expression ,micro–computed tomography ,Medical and Health Sciences ,Respiratory System ,Cardiovascular medicine and haematology ,Clinical sciences - Abstract
Rationale: The role of FSTL-1 (follistatin-like 1) in lung homeostasis is unknown.Objectives: We aimed to define the impact of FSTL-1 attenuation on lung structure and function and to identify FSTL-1-regulated transcriptional pathways in the lung. Further, we aimed to analyze the association of FSTL-1 SNPs with lung disease.Methods: FSTL-1 hypomorphic (FSTL-1 Hypo) mice underwent lung morphometry, pulmonary function testing, and micro-computed tomography. Fstl1 expression was determined in wild-type lung cell populations from three independent research groups. RNA sequencing of wild-type and FSTL-1 Hypo mice identified FSTL-1-regulated gene expression, followed by validation and mechanistic in vitro examination. FSTL1 SNP analysis was performed in the COPDGene (Genetic Epidemiology of Chronic Obstructive Pulmonary Disease) cohort.Measurements and Main Results: FSTL-1 Hypo mice developed spontaneous emphysema, independent of smoke exposure. Fstl1 is highly expressed in the lung by mesenchymal and endothelial cells but not immune cells. RNA sequencing of whole lung identified 33 FSTL-1-regulated genes, including Nr4a1, an orphan nuclear hormone receptor that negatively regulates NF-κB (nuclear factor-κB) signaling. In vitro, recombinant FSTL-1 treatment of macrophages attenuated NF-κB p65 phosphorylation in an Nr4a1-dependent manner. Within the COPDGene cohort, several SNPs in the FSTL1 region corresponded to chronic obstructive pulmonary disease and lung function.Conclusions: This work identifies a novel role for FSTL-1 protecting against emphysema development independent of smoke exposure. This FSTL-1-deficient emphysema implicates regulation of immune tolerance in lung macrophages through Nr4a1. Further study of the mechanisms involving FSTL-1 in lung homeostasis, immune regulation, and NF-κB signaling may provide additional insight into the pathophysiology of emphysema and inflammatory lung diseases.
- Published
- 2020
4. Uncompensated mitochondrial oxidative stress underlies heart failure in an iPSC-derived model of congenital heart disease
- Author
-
Xu, Xinxiu, Jin, Kang, Bais, Abha S., Zhu, Wenjuan, Yagi, Hisato, Feinstein, Timothy N., Nguyen, Phong K., Criscione, Joseph D., Liu, Xiaoqin, Beutner, Gisela, Karunakaran, Kalyani B., Rao, Krithika S., He, Haoting, Adams, Phillip, Kuo, Catherine K., Kostka, Dennis, Pryhuber, Gloria S., Shiva, Sruti, Ganapathiraju, Madhavi K., Porter, George A., Jr., Lin, Jiuann-Huey Ivy, Aronow, Bruce, and Lo, Cecilia W.
- Published
- 2022
- Full Text
- View/download PDF
5. Developmental Loci Harbor Clusters of Accelerated Regions That Evolved Independently in Ape Lineages.
- Author
-
Kostka, Dennis, Holloway, Alisha K, and Pollard, Katherine S
- Subjects
Animals ,Hominidae ,Humans ,Evolution ,Molecular ,Gene Conversion ,Algorithms ,Models ,Genetic ,Computer Simulation ,Selection ,Genetic ,positive selection ,biased gene conversion ,likelihood ratio test ,enhancer ,development ,primates ,Evolutionary Biology ,Genetics ,Biochemistry and Cell Biology - Abstract
Some of the fastest evolving regions of the human genome are conserved noncoding elements with many human-specific DNA substitutions. These human accelerated regions (HARs) are enriched nearby regulatory genes, and several HARs function as developmental enhancers. To investigate if this evolutionary signature is unique to humans, we quantified evidence of accelerated substitutions in conserved genomic elements across multiple lineages and applied this approach simultaneously to the genomes of five apes: human, chimpanzee, gorilla, orangutan, and gibbon. We find roughly similar numbers and genomic distributions of lineage-specific accelerated regions (linARs) in all five apes. In particular, apes share an enrichment of linARs in regulatory DNA nearby genes involved in development, especially transcription factors and other regulators. Many developmental loci harbor clusters of nonoverlapping linARs from multiple apes, suggesting that accelerated evolution in each species affected distinct regulatory elements that control a shared set of developmental pathways. Our statistical tests distinguish between GC-biased and unbiased accelerated substitution rates, allowing us to quantify the roles of different evolutionary forces in creating linARs. We find evidence of GC-biased gene conversion in each ape, but unbiased acceleration consistent with positive selection or loss of constraint is more common in all five lineages. It therefore appears that similar evolutionary processes created independent accelerated regions in the genomes of different apes, and that these lineage-specific changes to conserved noncoding sequences may have differentially altered expression of a core set of developmental genes across ape evolution.
- Published
- 2018
6. Chromatin accessibility and microRNA expression in nephron progenitor cells during kidney development
- Author
-
Clugston, Andrew, Bodnar, Andrew, Cerqueira, Débora Malta, Phua, Yu Leng, Lawler, Alyssa, Boggs, Kristy, Pfenning, Andreas R., Ho, Jacqueline, and Kostka, Dennis
- Published
- 2022
- Full Text
- View/download PDF
7. Computational profiling of hiPSC-derived heart organoids reveals chamber defects associated with NKX2-5 deficiency
- Author
-
Feng, Wei, Schriever, Hannah, Jiang, Shan, Bais, Abha, Wu, Haodi, Kostka, Dennis, and Li, Guang
- Published
- 2022
- Full Text
- View/download PDF
8. Single-cell transcriptomic analysis identifies murine heart molecular features at embryonic and neonatal stages
- Author
-
Feng, Wei, Bais, Abha, He, Haoting, Rios, Cassandra, Jiang, Shan, Xu, Juan, Chang, Cindy, Kostka, Dennis, and Li, Guang
- Published
- 2022
- Full Text
- View/download PDF
9. Modeling DNA methylation dynamics with approaches from phylogenetics
- Author
-
Capra, John A. and Kostka, Dennis
- Subjects
Quantitative Biology - Genomics - Abstract
Methylation of CpG dinucleotides is a prevalent epigenetic modification that is required for proper development in vertebrates, and changes in CpG methylation are essential to cellular differentiation. Genome-wide DNA methylation assays have become increasingly common, and recently distinct stages across differentiating cellular lineages have been assayed. How- ever, current methods for modeling methylation dynamics do not account for the dependency structure between precursor and dependent cell types. We developed a continuous-time Markov chain approach, based on the observation that changes in methylation state over tissue differentiation can be modeled similarly to DNA nucleotide changes over evolutionary time. This model explicitly takes precursor to descendant relationships into account and enables inference of CpG methylation dynamics. To illustrate our method, we analyzed a high-resolution methylation map of the differentiation of mouse stem cells into several blood cell types. Our model can successfully infer unobserved CpG methylation states from observations at the same sites in related cell types (90% correct), and this approach more accurately reconstructs missing data than imputation based on neighboring CpGs (84% correct). Additionally, the single CpG resolution of our methylation dynamics estimates enabled us to show that DNA sequence context of CpG sites is informative about methylation dynamics across tissue differentiation. Finally, we identified genomic regions with clusters of highly dynamic CpGs and present a likely functional example. Our work establishes a framework for inference and modeling that is well-suited to DNA methylation data, and our success suggests that other methods for analyzing DNA nucleotide substitutions will also translate to the modeling of epigenetic phenomena., Comment: 8 pages, 5 figures
- Published
- 2014
- Full Text
- View/download PDF
10. motifDiverge: a model for assessing the statistical significance of gene regulatory motif divergence between two DNA sequences
- Author
-
Kostka, Dennis, Friedrich, Tara, Holloway, Alisha K., and Pollard, Katherine S.
- Subjects
Quantitative Biology - Genomics - Abstract
Next-generation sequencing technology enables the identification of thousands of gene regulatory sequences in many cell types and organisms. We consider the problem of testing if two such sequences differ in their number of binding site motifs for a given transcription factor (TF) protein. Binding site motifs impart regulatory function by providing TFs the opportunity to bind to genomic elements and thereby affect the expression of nearby genes. Evolutionary changes to such functional DNA are hypothesized to be major contributors to phenotypic diversity within and between species; but despite the importance of TF motifs for gene expression, no method exists to test for motif loss or gain. Assuming that motif counts are Binomially distributed, and allowing for dependencies between motif instances in evolutionarily related sequences, we derive the probability mass function of the difference in motif counts between two nucleotide sequences. We provide a method to numerically estimate this distribution from genomic data and show through simulations that our estimator is accurate. Finally, we introduce the R package {\tt motifDiverge} that implements our methodology and illustrate its application to gene regulatory enhancers identified by a mouse developmental time course experiment. While this study was motivated by analysis of regulatory motifs, our results can be applied to any problem involving two correlated Bernoulli trials.
- Published
- 2014
11. Integrating diverse datasets improves developmental enhancer prediction
- Author
-
Erwin, Genevieve D., Truty, Rebecca M., Kostka, Dennis, Pollard, Katherine S., and Capra, John A.
- Subjects
Quantitative Biology - Genomics - Abstract
Gene-regulatory enhancers have been identified by many lines of evidence, including evolutionary conservation, regulatory protein binding, chromatin modifications, and DNA sequence motifs. To integrate these different approaches, we developed EnhancerFinder, a novel method for predicting developmental enhancers and their tissue specificity. EnhancerFinder uses a two-step multiple-kernel learning approach to integrate DNA sequence motifs, evolutionary patterns, and thousands of diverse functional genomics datasets from a variety of cell types and developmental stages. We trained EnhancerFinder on hundreds of experimentally verified human developmental enhancers from the VISTA Enhancer Browser, in contrast to histone mark or sequence-based enhancer definitions commonly used. We comprehensively evaluated EnhancerFinder, and found that our integrative approach improves enhancer prediction accuracy over previous approaches that consider a single type of data. Our evaluation highlights the importance of considering information from many tissues when predicting specific types of enhancers. We find that VISTA enhancers active in embryonic heart are easier to predict than enhancers active in several other tissues due to their uniquely high GC content. We applied EnhancerFinder to the entire human genome and predicted 84,301 developmental enhancers and their tissue specificity. These predictions provide specific functional annotations for large amounts of human non-coding DNA, and are significantly enriched near genes with annotated roles in their predicted tissues and hits from genome-wide association studies. We demonstrate the utility of our enhancer predictions by identifying and validating a novel cranial nerve enhancer in the ZEB2 locus. Our genome-wide developmental enhancer predictions will be freely available as a UCSC Genome Browser track., Comment: 33 pages, 7 figures
- Published
- 2013
- Full Text
- View/download PDF
12. A Model-Based Analysis of GC-Biased Gene Conversion in the Human and Chimpanzee Genomes
- Author
-
Capra, John A., Hubisz, Melissa J., Kostka, Dennis, Pollard, Katherine S., and Siepel, Adam
- Subjects
Quantitative Biology - Genomics ,Quantitative Biology - Populations and Evolution - Abstract
GC-biased gene conversion (gBGC) is a recombination-associated process that favors the fixation of G/C alleles over A/T alleles. In mammals, gBGC is hypothesized to contribute to variation in GC content, rapidly evolving sequences, and the fixation of deleterious mutations, but its prevalence and general functional consequences remain poorly understood. gBGC is difficult to incorporate into models of molecular evolution and so far has primarily been studied using summary statistics from genomic comparisons. Here, we introduce a new probabilistic model that captures the joint effects of natural selection and gBGC on nucleotide substitution patterns, while allowing for correlations along the genome in these effects. We implemented our model in a computer program, called phastBias, that can accurately detect gBGC tracts ~1 kilobase or longer in simulated sequence alignments. When applied to real primate genome sequences, phastBias predicts gBGC tracts that cover roughly 0.3% of the human and chimpanzee genomes and account for 1.2% of human-chimpanzee nucleotide differences. These tracts fall in clusters, particularly in subtelomeric regions; they are enriched for recombination hotspots and fast-evolving sequences; and they display an ongoing fixation preference for G and C alleles. We also find some evidence that they contribute to the fixation of deleterious alleles, including an enrichment for disease-associated polymorphisms. These tracts provide a unique window into historical recombination processes along the human and chimpanzee lineages; they supply additional evidence of long-term conservation of megabase-scale recombination rates accompanied by rapid turnover of hotspots. Together, these findings shed new light on the evolutionary, functional, and disease implications of gBGC. The phastBias program and our predicted tracts are freely available., Comment: 40 pages, 17 figures
- Published
- 2013
13. Single-cell RNA sequencing reveals differential cell cycle activity in key cell populations during nephrogenesis
- Author
-
Bais, Abha S., Cerqueira, Débora M., Clugston, Andrew, Bodnar, Andrew J., Ho, Jacqueline, and Kostka, Dennis
- Published
- 2021
- Full Text
- View/download PDF
14. motifDiverge: a model for assessing the statistical significance of gene regulatory motif divergence between two DNA sequences.
- Author
-
Kostka, Dennis, Friedrich, Tara, Holloway, Alisha K, and Pollard, Katherine S
- Subjects
Mathematical Sciences ,Statistics ,Genetics ,Biotechnology ,Human Genome ,Generic health relevance ,Testing ,Gene regulation ,Motif ,ChIP-seq ,Binomial ,Transcription factor ,Regulatory evolution - Abstract
Next-generation sequencing technology enables the identification of thousands of gene regulatory sequences in many cell types and organisms. We consider the problem of testing if two such sequences differ in their number of binding site motifs for a given transcription factor (TF) protein. Binding site motifs impart regulatory function by providing TFs the opportunity to bind to genomic elements and thereby affect the expression of nearby genes. Evolutionary changes to such functional DNA are hypothesized to be major contributors to phenotypic diversity within and between species; but despite the importance of TF motifs for gene expression, no method exists to test for motif loss or gain. Assuming that motif counts are Binomially distributed, and allowing for dependencies between motif instances in evolutionarily related sequences, we derive the probability mass function of the difference in motif counts between two nucleotide sequences. We provide a method to numerically estimate this distribution from genomic data and show through simulations that our estimator is accurate. Finally, we introduce the R package motifDiverge that implements our methodology and illustrate its application to gene regulatory enhancers identified by a mouse developmental time course experiment. While this study was motivated by analysis of regulatory motifs, our results can be applied to any problem involving two correlated Bernoulli trials.
- Published
- 2015
15. A model-based analysis of GC-biased gene conversion in the human and chimpanzee genomes.
- Author
-
Capra, John A, Hubisz, Melissa J, Kostka, Dennis, Pollard, Katherine S, and Siepel, Adam
- Subjects
Animals ,Mammals ,Humans ,Pan troglodytes ,Chromosome Mapping ,Sequence Alignment ,Evolution ,Molecular ,Phylogeny ,Recombination ,Genetic ,Gene Conversion ,Base Sequence ,Genome ,Models ,Theoretical ,Selection ,Genetic ,Evolution ,Molecular ,Models ,Theoretical ,Recombination ,Genetic ,Selection ,Genetics ,Developmental Biology - Abstract
GC-biased gene conversion (gBGC) is a recombination-associated process that favors the fixation of G/C alleles over A/T alleles. In mammals, gBGC is hypothesized to contribute to variation in GC content, rapidly evolving sequences, and the fixation of deleterious mutations, but its prevalence and general functional consequences remain poorly understood. gBGC is difficult to incorporate into models of molecular evolution and so far has primarily been studied using summary statistics from genomic comparisons. Here, we introduce a new probabilistic model that captures the joint effects of natural selection and gBGC on nucleotide substitution patterns, while allowing for correlations along the genome in these effects. We implemented our model in a computer program, called phastBias, that can accurately detect gBGC tracts about 1 kilobase or longer in simulated sequence alignments. When applied to real primate genome sequences, phastBias predicts gBGC tracts that cover roughly 0.3% of the human and chimpanzee genomes and account for 1.2% of human-chimpanzee nucleotide differences. These tracts fall in clusters, particularly in subtelomeric regions; they are enriched for recombination hotspots and fast-evolving sequences; and they display an ongoing fixation preference for G and C alleles. They are also significantly enriched for disease-associated polymorphisms, suggesting that they contribute to the fixation of deleterious alleles. The gBGC tracts provide a unique window into historical recombination processes along the human and chimpanzee lineages. They supply additional evidence of long-term conservation of megabase-scale recombination rates accompanied by rapid turnover of hotspots. Together, these findings shed new light on the evolutionary, functional, and disease implications of gBGC. The phastBias program and our predicted tracts are freely available.
- Published
- 2013
16. A high-resolution map of human evolutionary constraint using 29 mammals.
- Author
-
Lindblad-Toh, Kerstin, Garber, Manuel, Zuk, Or, Lin, Michael F, Parker, Brian J, Washietl, Stefan, Kheradpour, Pouya, Ernst, Jason, Jordan, Gregory, Mauceli, Evan, Ward, Lucas D, Lowe, Craig B, Holloway, Alisha K, Clamp, Michele, Gnerre, Sante, Alföldi, Jessica, Beal, Kathryn, Chang, Jean, Clawson, Hiram, Cuff, James, Di Palma, Federica, Fitzgerald, Stephen, Flicek, Paul, Guttman, Mitchell, Hubisz, Melissa J, Jaffe, David B, Jungreis, Irwin, Kent, W James, Kostka, Dennis, Lara, Marcia, Martins, Andre L, Massingham, Tim, Moltke, Ida, Raney, Brian J, Rasmussen, Matthew D, Robinson, Jim, Stark, Alexander, Vilella, Albert J, Wen, Jiayu, Xie, Xiaohui, Zody, Michael C, Broad Institute Sequencing Platform and Whole Genome Assembly Team, Baldwin, Jen, Bloom, Toby, Chin, Chee Whye, Heiman, Dave, Nicol, Robert, Nusbaum, Chad, Young, Sarah, Wilkinson, Jane, Worley, Kim C, Kovar, Christie L, Muzny, Donna M, Gibbs, Richard A, Baylor College of Medicine Human Genome Sequencing Center Sequencing Team, Cree, Andrew, Dihn, Huyen H, Fowler, Gerald, Jhangiani, Shalili, Joshi, Vandita, Lee, Sandra, Lewis, Lora R, Nazareth, Lynne V, Okwuonu, Geoffrey, Santibanez, Jireh, Warren, Wesley C, Mardis, Elaine R, Weinstock, George M, Wilson, Richard K, Genome Institute at Washington University, Delehaunty, Kim, Dooling, David, Fronik, Catrina, Fulton, Lucinda, Fulton, Bob, Graves, Tina, Minx, Patrick, Sodergren, Erica, Birney, Ewan, Margulies, Elliott H, Herrero, Javier, Green, Eric D, Haussler, David, Siepel, Adam, Goldman, Nick, Pollard, Katherine S, Pedersen, Jakob S, Lander, Eric S, and Kellis, Manolis
- Subjects
Broad Institute Sequencing Platform and Whole Genome Assembly Team ,Baylor College of Medicine Human Genome Sequencing Center Sequencing Team ,Genome Institute at Washington University ,Animals ,Mammals ,Humans ,Disease ,RNA ,Sequence Alignment ,Sequence Analysis ,DNA ,Genomics ,Evolution ,Molecular ,Phylogeny ,Genome ,Genome ,Human ,Exons ,Health ,Selection ,Genetic ,Molecular Sequence Annotation ,Sequence Analysis ,DNA ,Evolution ,Molecular ,Human ,Selection ,Genetic ,General Science & Technology - Abstract
The comparison of related genomes has emerged as a powerful lens for genome interpretation. Here we report the sequencing and comparative analysis of 29 eutherian genomes. We confirm that at least 5.5% of the human genome has undergone purifying selection, and locate constrained elements covering ∼4.2% of the genome. We use evolutionary signatures and comparisons with experimental data sets to suggest candidate functions for ∼60% of constrained bases. These elements reveal a small number of new coding exons, candidate stop codon readthrough events and over 10,000 regions of overlapping synonymous constraint within protein-coding exons. We find 220 candidate RNA structural families, and nearly a million elements overlapping potential promoter, enhancer and insulator regions. We report specific amino acid residues that have undergone positive selection, 280,000 non-coding elements exapted from mobile elements and more than 1,000 primate- and human-accelerated elements. Overlap with disease-associated variants indicates that our findings will be relevant for studies of human biology, health and disease.
- Published
- 2011
17. Noncoding Sequences Near Duplicated Genes Evolve Rapidly
- Author
-
Kostka, Dennis, Hahn, Matthew W, and Pollard, Katherine S
- Subjects
Biological Sciences ,Bioinformatics and Computational Biology ,Genetics ,Human Genome ,Biotechnology ,5' Untranslated Regions ,Animals ,Binding Sites ,Cercopithecidae ,Evolution ,Molecular ,Exons ,Female ,Gene Duplication ,Genome ,Human ,Genome-Wide Association Study ,Humans ,Macaca ,Models ,Genetic ,Pan troglodytes ,Pregnancy ,Pregnancy Maintenance ,RNA ,Untranslated ,Sequence Alignment ,Species Specificity ,Transcription Factors ,accelerated substitution ,noncoding sequence ,gene duplication ,Biochemistry and Cell Biology ,Evolutionary Biology ,Developmental Biology ,Biochemistry and cell biology ,Evolutionary biology - Abstract
Gene expression divergence and chromosomal rearrangements have been put forward as major contributors to phenotypic differences between closely related species. It has also been established that duplicated genes show enhanced rates of positive selection in their amino acid sequences. If functional divergence is largely due to changes in gene expression, it follows that regulatory sequences in duplicated loci should also evolve rapidly. To investigate this hypothesis, we performed likelihood ratio tests (LRTs) on all noncoding loci within 5 kb of every transcript in the human genome and identified sequences with increased substitution rates in the human lineage since divergence from Old World Monkeys. The fraction of rapidly evolving loci is significantly higher nearby genes that duplicated in the common ancestor of humans and chimps compared with nonduplicated genes. We also conducted a genome-wide scan for nucleotide substitutions predicted to affect transcription factor binding. Rates of binding site divergence are elevated in noncoding sequences of duplicated loci with accelerated substitution rates. Many of the genes associated with these fast-evolving genomic elements belong to functional categories identified in previous studies of positive selection on amino acid sequences. In addition, we find enrichment for accelerated evolution nearby genes involved in establishment and maintenance of pregnancy, processes that differ significantly between humans and monkeys. Our findings support the hypothesis that adaptive evolution of the regulation of duplicated genes has played a significant role in human evolution.
- Published
- 2010
18. Abstract 13039: Loss of Visceral Adipose Tissue Macrophage Subsets Induces Myocardial Infarction Induced Insulin Resistance: Role of Adiponectin
- Author
-
Vasamsetti, Sathish Babu B, zhang, xinyi, Coppin, Emillie M, Florentin, Jonathan, Koul, Sasha, Gotberg, Matthias, Clugston, Andrew S, Thoma, Floyd, Sembrat, John, Bullock, Grant C, Kostka, Dennis, St. Croix, Claudette M, Chattopadhyay, Ansuman, rojas, mauricio, Mulukutla, Suresh, and Dutta, Partha
- Published
- 2020
- Full Text
- View/download PDF
19. An intrinsically interpretable neural network architecture for sequence-to-function learning
- Author
-
Balcı, Ali Tuğrul, primary, Ebeid, Mark Maher, additional, Benos, Panayiotis V, additional, Kostka, Dennis, additional, and Chikina, Maria, additional
- Published
- 2023
- Full Text
- View/download PDF
20. Genome-wide enhancer annotations differ significantly in genomic distribution, evolution, and function
- Author
-
Benton, Mary Lauren, Talipineni, Sai Charan, Kostka, Dennis, and Capra, John A.
- Published
- 2019
- Full Text
- View/download PDF
21. aenmd: Annotating escape from nonsense-mediated decay for transcripts with protein-truncating variants
- Author
-
Klonowski, Jonathan, Liang, Qianqian, Akdemir, Zeynep Coban, Lo, Cecilia, and Kostka, Dennis
- Subjects
Article - Abstract
Summary: DNA changes that cause premature termination codons (PTCs) represent a large fraction of clinically relevant pathogenic genomic variation. Typically, PTCs induce a transcript's degradation by nonsense-mediated mRNA decay (NMD) and render such changes loss-of-function alleles. However, certain PTC-containing transcripts escape NMD and can exert dominant-negative or gain-of-function (DN/GOF) effects. Therefore, systematic identification of human PTC-causing variants and their susceptibility to NMD contributes to the investigation of the role of DN/GOF alleles in human disease. Here we present aenmd, a software for annotating PTC-containing transcript-variant pairs for predicted escape from NMD. aenmd is user-friendly and self-contained. It offers functionality not currently available in other methods and is based on established and experimentally validated rules for NMD escape; the software is designed to work at scale, and to integrate seamlessly with existing analysis workflows. We applied aenmd to variants in the gnomAD, Clinvar, and GWAS catalog databases and report the prevalence of human PTC-causing variants in these databases, and the subset of these that could exert DN/GOF effects via NMD escape. Availability and implementation: aenmd is implemented in the R programming language. Code is available on GitHub as an R package (github.com/kostkalab/aenmd.git), and as a containerized command-line interface (github.com/kostkalab/aenmd_cli.git).
- Published
- 2023
- Full Text
- View/download PDF
22. Vaeda computationally annotates doublets in single-cell RNA sequencing data
- Author
-
Schriever, Hannah and Kostka, Dennis
- Subjects
Statistics and Probability ,Biomedical Research ,Sequence Analysis, RNA ,Gene Expression Profiling ,Biochemistry ,Computer Science Applications ,Computational Mathematics ,Computational Theory and Mathematics ,single cell RNA sequencing ,doublet annotation ,Single-Cell Analysis ,Artifacts ,Molecular Biology ,Software - Abstract
Motivation Single-cell RNA sequencing (scRNA-seq) continues to expand our knowledge by facilitating the study of transcriptional heterogeneity at the level of single cells. Despite this technology’s utility and success in biomedical research, technical artifacts are present in scRNA-seq data. Doublets/multiplets are a type of artifact that occurs when two or more cells are tagged by the same barcode, and therefore they appear as a single cell. Because this introduces non-existent transcriptional profiles, doublets can bias and mislead downstream analysis. To address this limitation, computational methods to annotate and remove doublets form scRNA-seq datasets are needed. Results We introduce vaeda (Variational Auto-Encoder for Doublet Annotation), a new approach for computational annotation of doublets in scRNA-seq data. Vaeda integrates a variational auto-encoder and Positive-Unlabeled learning to produce doublet scores and binary doublet calls. We apply vaeda, along with seven existing doublet annotation methods, to 16 benchmark datasets and find that vaeda performs competitively in terms of doublet scores and doublet calls. Notably, vaeda outperforms other python-based methods for doublet annotation. Altogether, vaeda is a robust and competitive method for scRNA-seq doublet annotation and may be of particular interest in the context of python-based workflows. Availability and implementation Vaeda is available at https://github.com/kostkalab/vaeda, and the version used for the results we present here is archived at zenodo (https://doi.org/10.5281/zenodo.7199783). Supplementary information Supplementary data are available at Bioinformatics online.
- Published
- 2022
- Full Text
- View/download PDF
23. Vaeda computationally annotates doublets in single-cell RNA sequencing data
- Author
-
Schriever, Hannah, primary and Kostka, Dennis, additional
- Published
- 2022
- Full Text
- View/download PDF
24. Non-negative Independent Factor Analysis disentangles discrete and continuous sources of variation in scRNA-seq data
- Author
-
Mao, Weiguang, primary, Pouyan, Maziyar Baran, additional, Kostka, Dennis, additional, and Chikina, Maria, additional
- Published
- 2022
- Full Text
- View/download PDF
25. Gibbon genome and the fast karyotype evolution of small apes
- Author
-
Carbone, Lucia, Alan Harris, R., Gnerre, Sante, Veeramah, Krishna R., Lorente-Galdos, Belen, Huddleston, John, Meyer, Thomas J., Herrero, Javier, Roos, Christian, Aken, Bronwen, Anaclerio, Fabio, Archidiacono, Nicoletta, Baker, Carl, Barrell, Daniel, Batzer, Mark A., Beal, Kathryn, Blancher, Antoine, Bohrson, Craig L., Brameier, Markus, Campbell, Michael S., Capozzi, Oronzo, Casola, Claudio, Chiatante, Giorgia, Cree, Andrew, Damert, Annette, de Jong, Pieter J., Dumas, Laura, Fernandez-Callejo, Marcos, Flicek, Paul, Fuchs, Nina V., Gut, Ivo, Gut, Marta, Hahn, Matthew W., Hernandez-Rodriguez, Jessica, Hillier, LaDeana W., Hubley, Robert, Ianc, Bianca, Izsvák, Zsuzsanna, Jablonski, Nina G., Johnstone, Laurel M., Karimpour-Fard, Anis, Konkel, Miriam K., Kostka, Dennis, Lazar, Nathan H., Lee, Sandra L., Lewis, Lora R., Liu, Yue, Locke, Devin P., Mallick, Swapan, Mendez, Fernando L., Muffato, Matthieu, Nazareth, Lynne V., Nevonen, Kimberly A., O’Bleness, Majesta, Ochis, Cornelia, Odom, Duncan T., Pollard, Katherine S., Quilez, Javier, Reich, David, Rocchi, Mariano, Schumann, Gerald G., Searle, Stephen, Sikela, James M., Skollar, Gabriella, Smit, Arian, Sonmez, Kemal, Hallers, Boudewijn ten, Terhune, Elizabeth, Thomas, Gregg W. C., Ullmer, Brygg, Ventura, Mario, Walker, Jerilyn A., Wall, Jeffrey D., Walter, Lutz, Ward, Michelle C., Wheelan, Sarah J., Whelan, Christopher W., White, Simon, Wilhelm, Larry J., Woerner, August E., Yandell, Mark, Zhu, Baoli, Hammer, Michael F., Marques-Bonet, Tomas, Eichler, Evan E., Fulton, Lucinda, Fronick, Catrina, Muzny, Donna M., Warren, Wesley C., Worley, Kim C., Rogers, Jeffrey, Wilson, Richard K., and Gibbs, Richard A.
- Published
- 2014
- Full Text
- View/download PDF
26. Vaeda computationally annotates doublets in single-cell RNA sequencing data.
- Author
-
Schriever, Hannah and Kostka, Dennis
- Subjects
- *
PROTEIN-protein interactions , *MEDICAL research - Abstract
Motivation Single-cell RNA sequencing (scRNA-seq) continues to expand our knowledge by facilitating the study of transcriptional heterogeneity at the level of single cells. Despite this technology's utility and success in biomedical research, technical artifacts are present in scRNA-seq data. Doublets/multiplets are a type of artifact that occurs when two or more cells are tagged by the same barcode, and therefore they appear as a single cell. Because this introduces non-existent transcriptional profiles, doublets can bias and mislead downstream analysis. To address this limitation, computational methods to annotate and remove doublets form scRNA-seq datasets are needed. Results We introduce vaeda (Variational Auto-Encoder for Doublet Annotation), a new approach for computational annotation of doublets in scRNA-seq data. Vaeda integrates a variational auto-encoder and Positive-Unlabeled learning to produce doublet scores and binary doublet calls. We apply vaeda, along with seven existing doublet annotation methods, to 16 benchmark datasets and find that vaeda performs competitively in terms of doublet scores and doublet calls. Notably, vaeda outperforms other python-based methods for doublet annotation. Altogether, vaeda is a robust and competitive method for scRNA-seq doublet annotation and may be of particular interest in the context of python-based workflows. Availability and implementation Vaeda is available at https://github.com/kostkalab/vaeda , and the version used for the results we present here is archived at zenodo (https://doi.org/10.5281/zenodo.7199783). Supplementary information Supplementary data are available at Bioinformatics online. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
27. Regulatory networks define phenotypic classes of human stem cell lines
- Author
-
Muller, Franz-Josef, Laurent, Louise C., Kostka, Dennis, Ulitsky, Igor, Williams, Roy, Lu, Christina, Park, In-Hyun, Rao, Mahendra S., Shamir, Ron, Schwartz, Philip H., Schmidt, Nils O., and Loring, Jeanne F.
- Subjects
Evaluation ,Physiological aspects ,Research ,Phenotypes -- Research -- Physiological aspects ,Cell lines -- Research -- Physiological aspects ,Stem cell research -- Evaluation -- Physiological aspects -- Research ,Phenotype -- Research -- Physiological aspects - Abstract
Stem cells are defined as self-renewing cell populations that can differentiate into multiple distinct cell types. However, hundreds of different human cell lines from embryonic, fetal and adult sources have [...]
- Published
- 2008
28. The Role of GC-Biased Gene Conversion in Shaping the Fastest Evolving Regions of the Human Genome
- Author
-
Kostka, Dennis, Hubisz, Melissa J., Siepel, Adam, and Pollard, Katherine S.
- Published
- 2012
- Full Text
- View/download PDF
29. The Importance of Being Cis: Evolution of Orthologous Fish and Mammalian Enhancer Activity
- Author
-
Ritter, Deborah I., Li, Qiang, Kostka, Dennis, Pollard, Katherine S., Guo, Su, and Chuang, Jeffrey H.
- Published
- 2010
- Full Text
- View/download PDF
30. Non-negative Independent Factor Analysis for single cell RNA-seq
- Author
-
Mao, Weiguang, Baran Pouyan, Maziyar, Kostka, Dennis, and Chikina, Maria
- Abstract
Motivation Single cell RNA sequencing (scRNA-seq) enables transcriptional profiling at the level of individual cells. With the emergence of high-throughput platforms datasets comprising tens of thousands or more cells have become routine, and the technology is having an impact across a wide range of biomedical subject areas. However, scRNA-seq data are high-dimensional and affected by noise, so that scalable and robust computational techniques are needed for meaningful analysis, visualization and interpretation. Specifically, a range of matrix factorization techniques have been employed to aid scRNA-seq data analysis. In this context we note that sources contributing to biological variability between cells can be discrete (or multi-modal, for instance cell-types), or continuous (e.g. pathway activity). However, no current matrix factorization approach is set up to jointly infer such mixed sources of variability. Results To address this shortcoming, we present a new probabilistic single-cell factor analysis model, N on-negative I ndependent F actor A nalysis (NIFA), that combines features of complementary approaches like Independent Component Analysis (ICA), Principal Component Analysis (PCA), and Non-negative Matrix Factorization (NMF). NIFA simultaneously models uni- and multi-modal latent factors and can so isolate discrete cell-type identity and continuous pathway-level variations into separate components. Similar to NMF, NIFA constrains factor loadings to be non-negative in order to increase biological interpretability. We apply our approach to a range of data sets where cell-type identity is known, and we show that NIFA-derived factors outperform results from ICA, PCA and NMF in terms of cell-type identification and biological interpretability. Studying an immunotherapy dataset in detail, we show that NIFA identifies biomedically meaningful sources of variation, derive an improved expression signature for regulatory T-cells, and identify a novel myeloid cell subtype associated with treatment response. Overall, NIFA is a general approach advancing scRNA-seq analysis capabilities and it allows researchers to better take advantage of their data. NIFA is available at https://github.com/wgmao/NIFA . Contact mchikina@pitt.edu
- Published
- 2020
- Full Text
- View/download PDF
31. Analyzing gene perturbation screens with nested effects models in R and bioconductor
- Author
-
Fröhlich, Holger, Beibarth, Tim, Tresch, Achim, Kostka, Dennis, Jacob, Juby, Spang, Rainer, and Markowetz, F.
- Published
- 2008
32. Detecting hierarchical structure in molecular characteristics of disease using transitive approximations of directed graphs
- Author
-
Jacob, Juby, Jentsch, Marcel, Kostka, Dennis, Bentink, Stefan, and Spang, Rainer
- Published
- 2008
33. Nested effects models for high-dimensional phenotyping screens
- Author
-
Markowetz, Florian, Kostka, Dennis, Troyanskaya, Olga G., and Spang, Rainer
- Published
- 2007
34. A physical model for tiling array analysis
- Author
-
Chung, Ho-Ryun, Kostka, Dennis, and Vingron, Martin
- Published
- 2007
35. Additional file 1: of Genome-wide enhancer annotations differ significantly in genomic distribution, evolution, and function
- Author
-
Benton, Mary, Talipineni, Sai, Kostka, Dennis, and Capra, John
- Abstract
Figure S1. Enhancer sets across all contexts considered differ in both number (A) and length (B): Gm12878; Heart. Figure S2. Enhancer sets have low amounts of overlap with each other in bp-wise comparisons. Figure S3. Enhancer sets overlap more than expected by chance in element-wise comparisons. Figure S4. Enhancer sets have low amounts of overlap with each other in element-wise comparisons. Figure S5. Enhancers identified by different methods differ in enrichment for base pair overlap with functional attributes. Figure S6. Enhancer identification strategies recognize different subsets of validated enhancers. Figure S7. K562 enhancer sets have similar low levels of enrichment for activating regions validated by Sharpr-MPRA. Figure S8. Even among the variants in each functional LD block (r2 > 0.9) with the most enhancer set overlap, there is substantial disagreement between enhancer identification methods. Figure S9. Pairwise similarity for GO Molecular Function (MF) enrichments for enhancer sets based on JEME’s putative mappings to target genes in K562 (A), Gm12878 (B), liver (C), and heart (D). Figure S10. Pairwise similarity for GO Biological Process (BP) for enhancer sets based on JEME’s putative mappings to target genes in K562 (A), Gm12878 (B), liver (C), and heart (D). Figure S11. Pairwise similarity for GO Molecular Function (MF) enrichments from GREAT for liver enhancer sets. Figure S12. There is low pairwise similarity between GO Molecular Function (MF) enrichments calculated with GREAT for enhancer sets in the same context. Figure S13. There is low pairwise similarity between GO Biological Process (BP) enrichments calculated with GREAT for enhancer sets in the same context. Figure S14. Clustering enhancer sets on similarity of enriched transcription factor binding motifs illustrates different clustering of methods. Figure S15. Regions identified as enhancers by multiple methods do not have higher confidence scores than regions identified by a single method. Figure S16. Score distributions for K562 enhancer sets are similar between regions identified as enhancers by a single method and those identified by multiple methods: (A) H3K27acPlusH3K4me1, (B) H3K27acMinusH3K4me3, (C) DNasePlusHistone, (D) EncodeEnhancerlike, and (E) FANTOM. Figure S17. Score distributions for Gm12878 enhancer sets are similar between regions identified as enhancers by a single method and those identified by multiple methods: (A) H3K27acPlusH3K4me1, (B) H3K27acMinusH3K4me3, (C) DNasePlusHistone, (D) EncodeEnhancerlike, and (E) FANTOM. Figure S18. Score distributions for heart enhancer sets are similar between regions identified as enhancers by a single method and those identified by multiple methods: (A) H3K27acPlusH3K4me1, (B) H3K27acMinusH3K4me3, (C) DNasePlusHistone, (D) EncodeEnhancerlike, and (E) FANTOM. Figure S19. Enrichment for functional attributes is not significantly different between regions identified as enhancers by a single method and those identified by multiple methods when focusing on the top 100 predictions from each method. Figure S20. Same as Fig. S19, but considering the top 500 predictions from each method. Table S1. The average distance (in bp) to the closest TSS over all enhancers identified by each method in each cellular context. Table S2. Summary statistics for pairwise percent overlap, both in a base pair and element-wise comparison. Table S3. Number of observed VISTA heart positive and VISTA negative overlaps for each context and enhancer identification method. Table S4. Curated list of relevant GWAS phenotypes for liver (n = 50) and heart (n = 169). Table S5. Enrichments for overlap with context-specific SNPs in liver and heart. Table S6. Number of overlapping GWAS SNPs per enhancer identification method and context. Table S7. Enrichments for overlap with context-specific eQTL in liver and heart. Table S8. Number of overlapping GTEx eQTL per enhancer identification method and context. Table S9. Number of target genes mapped to each enhancer set by JEME. For K562 and Gm12878, p300 and GRO-cap are not included in this mapping. Table S11. Number of enhancers removed by length filtering. (DOCX 36740 kb)
- Published
- 2019
- Full Text
- View/download PDF
36. scds: computational annotation of doublets in single-cell RNA sequencing data
- Author
-
Bais, Abha S, primary and Kostka, Dennis, additional
- Published
- 2019
- Full Text
- View/download PDF
37. Large-scale inference of competing endogenous RNA networks with sparse partial correlation
- Author
-
List, Markus, primary, Dehghani Amirabad, Azim, additional, Kostka, Dennis, additional, and Schulz, Marcel H, additional
- Published
- 2019
- Full Text
- View/download PDF
38. Von Hippel-Lindau Acts as a Metabolic Switch Controlling Nephron Progenitor Differentiation
- Author
-
Cargill, Kasey, primary, Hemker, Shelby L., additional, Clugston, Andrew, additional, Murali, Anjana, additional, Mukherjee, Elina, additional, Liu, Jiao, additional, Bushnell, Daniel, additional, Bodnar, Andrew J., additional, Saifudeen, Zubaida, additional, Ho, Jacqueline, additional, Bates, Carlton M., additional, Kostka, Dennis, additional, Goetzman, Eric S., additional, and Sims-Lucas, Sunder, additional
- Published
- 2019
- Full Text
- View/download PDF
39. Loss ofmiR-17~92results in dysregulation ofCftrin nephron progenitors
- Author
-
Phua, Yu Leng, primary, Chen, Kevin Hong, additional, Hemker, Shelby L., additional, Marrone, April K., additional, Bodnar, Andrew J., additional, Liu, Xiaoning, additional, Clugston, Andrew, additional, Kostka, Dennis, additional, Butterworth, Michael B., additional, and Ho, Jacqueline, additional
- Published
- 2019
- Full Text
- View/download PDF
40. Small non-coding RNA expression in mouse nephrogenic mesenchymal progenitors
- Author
-
Phua, Yu Leng, primary, Clugston, Andrew, additional, Chen, Kevin Hong, additional, Kostka, Dennis, additional, and Ho, Jacqueline, additional
- Published
- 2018
- Full Text
- View/download PDF
41. Random forest based similarity learning for single cell RNA sequencing data
- Author
-
Pouyan, Maziyar Baran, primary and Kostka, Dennis, additional
- Published
- 2018
- Full Text
- View/download PDF
42. scds: computational annotation of doublets in single-cell RNA sequencing data.
- Author
-
Bais, Abha S and Kostka, Dennis
- Subjects
- *
CELL analysis , *ANNOTATIONS , *GENE expression , *MEDICAL research , *RNA sequencing - Abstract
Motivation Single-cell RNA sequencing (scRNA-seq) technologies enable the study of transcriptional heterogeneity at the resolution of individual cells and have an increasing impact on biomedical research. However, it is known that these methods sometimes wrongly consider two or more cells as single cells, and that a number of so-called doublets is present in the output of such experiments. Treating doublets as single cells in downstream analyses can severely bias a study's conclusions, and therefore computational strategies for the identification of doublets are needed. Results With scds , we propose two new approaches for in silico doublet identification: Co-expression based doublet scoring (cxds) and binary classification based doublet scoring (bcds). The co-expression based approach, cxds , utilizes binarized (absence/presence) gene expression data and, employing a binomial model for the co-expression of pairs of genes, yields interpretable doublet annotations. bcds , on the other hand, uses a binary classification approach to discriminate artificial doublets from original data. We apply our methods and existing computational doublet identification approaches to four datasets with experimental doublet annotations and find that our methods perform at least as well as the state of the art, at comparably little computational cost. We observe appreciable differences between methods and across datasets and that no approach dominates all others. In summary, scd s presents a scalable, competitive approach that allows for doublet annotation of datasets with thousands of cells in a matter of seconds. Availability and implementation scds is implemented as a Bioconductor R package (doi: 10.18129/B9.bioc.scds). Supplementary information Supplementary data are available at Bioinformatics online. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
43. Loss of miR-17~92 results in dysregulation of Cftr in nephron progenitors.
- Author
-
Yu Leng Phua, Kevin Hong Chen, Hemker, Shelby L., Marrone, April K., Bodnar, Andrew J., Xiaoning Liu, Clugston, Andrew, Kostka, Dennis, Butterworth, Michael B., and Jacqueline Ho
- Abstract
We have previously demonstrated that loss of miR-17~92 in nephron progenitors in a mouse model results in renal hypodysplasia and chronic kidney disease. Clinically, decreased congenital nephron endowment because of renal hypodysplasia is associated with an increased risk of hypertension and chronic kidney disease, and this is at least partly dependent on the self-renewal of nephron progenitors. Here, we present evidence for a novel molecular mechanism regulating the self-renewal of nephron progenitors and congenital nephron endowment by the highly conserved miR-17~92 cluster. Whole transcriptome sequencing revealed that nephron progenitors lacking this cluster demonstrated increased Cftr expression. We showed that one member of the cluster, miR-19b, is sufficient to repress Cftr expression in vitro and that perturbation of Cftr activity in nephron progenitors results in impaired proliferation. Together, these data suggest that miR-19b regulates Cftr expression in nephron progenitors, with this interaction playing a role in appropriate nephron progenitor self-renewal during kidney development to generate normal nephron endowment. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
44. Correction: DAF-16 and TCER-1 Facilitate Adaptation to Germline Loss by Restoring Lipid Homeostasis and Repressing Reproductive Physiology in C. elegans
- Author
-
Amrit, Francis Raj Gandhi, primary, Steenkiste, Elizabeth Marie, additional, Ratnappan, Ramesh, additional, Chen, Shaw-Wen, additional, McClendon, T. Brooke, additional, Kostka, Dennis, additional, Yanowitz, Judith, additional, Olsen, Carissa Perez, additional, and Ghazi, Arjumand, additional
- Published
- 2016
- Full Text
- View/download PDF
45. Loss of miR-17~92results in dysregulation of Cftrin nephron progenitors
- Author
-
Phua, Yu Leng, Chen, Kevin Hong, Hemker, Shelby L., Marrone, April K., Bodnar, Andrew J., Liu, Xiaoning, Clugston, Andrew, Kostka, Dennis, Butterworth, Michael B., and Ho, Jacqueline
- Abstract
We have previously demonstrated that loss of miR-17~92in nephron progenitors in a mouse model results in renal hypodysplasia and chronic kidney disease. Clinically, decreased congenital nephron endowment because of renal hypodysplasia is associated with an increased risk of hypertension and chronic kidney disease, and this is at least partly dependent on the self-renewal of nephron progenitors. Here, we present evidence for a novel molecular mechanism regulating the self-renewal of nephron progenitors and congenital nephron endowment by the highly conserved miR-17~92cluster. Whole transcriptome sequencing revealed that nephron progenitors lacking this cluster demonstrated increased Cftrexpression. We showed that one member of the cluster, miR-19b, is sufficient to repress Cftrexpression in vitro and that perturbation of Cftr activity in nephron progenitors results in impaired proliferation. Together, these data suggest that miR-19bregulates Cftrexpression in nephron progenitors, with this interaction playing a role in appropriate nephron progenitor self-renewal during kidney development to generate normal nephron endowment.
- Published
- 2019
- Full Text
- View/download PDF
46. DAF-16 and TCER-1 Facilitate Adaptation to Germline Loss by Restoring Lipid Homeostasis and Repressing Reproductive Physiology in C. elegans
- Author
-
Amrit, Francis Raj Gandhi, primary, Steenkiste, Elizabeth Marie, additional, Ratnappan, Ramesh, additional, Chen, Shaw-Wen, additional, McClendon, T. Brooke, additional, Kostka, Dennis, additional, Yanowitz, Judith, additional, Olsen, Carissa Perez, additional, and Ghazi, Arjumand, additional
- Published
- 2016
- Full Text
- View/download PDF
47. Methodology for exploring and communicating molecular characteristics of disease
- Author
-
Kostka, Dennis Alexander
- Subjects
Data Mining ,000 Informatik, Informationswissenschaft, allgemeine Werke::000 Informatik, Wissen, Systeme::004 Datenverarbeitung ,Informatik ,Pattern Recognition ,Microarray ,Classification ,Molecular Diagnosis - Abstract
Contents, Acknowledgements and Introduction i 1.Finding Molecular Characteristics of Disease 1 * Motivation * Supervised Classification of Patients * Discussion and Summary 2\. Communicating Molecular Characteristics of Disease 19 * Motivation * Preprocessing of Oligonucleotide Microarrays * Documentation of Signatures * Application to Data * Compatibility of External Patients to Core Data * Discussion and Summary 3\. Exploring Molecular Characteristics of Disease 45 * Motivation * The dcoex Algorithm * Application to Data * Discussion and Summary Summary and Bibliography 67 Appendices 85, Microarray data data characterizes cells on the transcriptional level. Prominent applications of microarray technology in a clinical setting are the molecular diagnosis of patients and the discovery of disease subtypes by patient stratification (clustering). Lists of differentially expressed genes are often used to guide biological intuition. In general,the data can be utilized to infer novel biological hypotheses by means of pattern mining and to refine or confirm existing knowledge. This thesis contains methodological contributions to both settings. It is composed of three chapters. The first chapter describes statistical learning techniques, which are frequently applied to microarray data with the goal of obtaining rules for molecular diagnosis. The focus lies on characteristics arising from the specific nature of high dimensional microarray data. This chapter concisely integrates concepts, algorithms and practical aspects of microarray data analysis that are usually found in distinct fields of the literature. It provides the theoretical foundation of the other chapters. The second chapter is concerned with the unambiguous documentation of a diagnostic molecular signature or, equivalently, with the unequivocal characterization of disease or subtype of disease. Themotivation to address documentation and communication of molecular signatures is a practical one: Microarray based gene expression signatures have the potential to be powerful tools for patient stratification and diagnosis of disease. But before they can affect clinical practice they need to be communicated to other health care centers with data for independent validation. External validation of a signature can only be meaningful if the new data is transformed to a scale compatible with the original one the signature is tuned to. This scale, in turn, depends on the initial preprocessing applied in the signature deriving study. It needs to be communicated alongside with the signature. Chapter two formalizes this requirement and contains scale adjusting transformations for two popular preprocessing schemes. Using eight clinical microarray data sets I am able to show significantly increased consistency and stability of molecular diagnoses as compared to standard documentation procedures. This underlines the key point of the chapter: Data preprocessing has to be taken into account when documenting molecular characteristics of disease. The third chapter introduces the dcoex algorithm, a method designed to utilize microarray data to reveal groups of genes losing coregulation between two phenotypes. Information about differentially coregulated genes can not only provide a molecular characterization of the phenotypes; it also provides focused information which is useful to generate hypotheses about biological mechanisms underlying the phenotypical differentiation. This chapter introduces the concept, implements an algorithm for detection and demonstrates the biological plausibility of differentially coexpressed genes. In a data set on childhood leukemia we find a biologically plausible group of genes differentially coexpressed between cytogenetically normal children and children bearing a Philadelphia chromosome. After assessing robustness and statistical significance of our findings we conclude that dcoex constitutes a new analysis tool enabling the exploration of differential coexpression patterns., Diese Arbeit dreht sich um die Charakterisierung von Krankheiten mit Hilfe von Genexpressionsdaten. Solche Daten stellen Zellen auf molekularer Ebene dar und können zur Beschreibung von Krankheiten auf zweierlei Art verwendet werden: Zum einen kann man bekannte Krankheiten genauer und verläßlicher diagnostizieren. Zum anderen kann man versuchen, in stetig wiederkehrenden Expressionsmustern entweder neue Krankheitsentitäten zu entdecken, oder aber aufgrund solcher Muster auf biologisch-medizinische Ursachenbekannter Krankheiten zu schließen. Die vorliegende Arbeit enthält methodologisch neue Ansätze für beide Szenarien. Nach einer Einleitung, die unter anderem die Microarray-Technik kurz skizziert, folgt ein ein weiteres einführendes Kapitel. Darin werden Methoden der statistischen Lerntheorie beschrieben, die man benutzen kann um aus Beispieldaten Schemata (oder molekulare Signaturen) für eine Diagnose abzuleiten. Die Darstellung ist auf die Anwendung statistischer Verfahren auf Microarray Daten zugeschnitten und das Kapitel bildet die theoretische Grundlage der folgenden Arbeit. Thema des zweiten Kapitels ist die unzweideutige Dokumentation einmal hergeleiteter molekularer Signaturen. Die Dokumentation einer Expressionssignatur ist ein notwendiger Schritt, falls diese zwischen Wissenschaftlern und Forschungseinrichtungen ausgetauscht werden soll. Ein solcher Austausch aber muss Tests und Validierungen einer Signatur vorangehen, die ihrerseits für den klinischen Einsatz unerläßlich sind. Wir stellen zwei Methoden vor, die gebräuchliche Strategien der Datenvorverarbeitung ergänzen und demonstrieren eine signifikante Erhöhung der Stimmigkeit von Diagnosen an verschiedenen Datensätzen. Im dritten Kapitel wird das Konzept der differentiellen Ko-Expression und der dazugehörige dcoex Algorithmus vorgestellt. Eine Gruppe differentiell koexprimierter Gene hat die Eigenschaft in Proben eines bestimmten Phänotyps kohärent exprimiert zusein, verliert diese Kohärenz allerdings in den Proben eines anderen Phänotyps. Der dcoex Algorithmus ist eine Methode solche Gruppen von differentiell koexprimierten Genen in Datensätzen zu finden, wobei ein kombinatorisches Optimierungsproblem heuristisch gelöst wird. Gruppen differentiell koexprimierter Gene können nicht nur zur molekularen Charakterisierung unterschiedlicher Phänotypen beitragen. Aus den Gengruppen abgeleitete Informationen kann man zur Formulierung fokussierter biologischer Hypothesen verwenden. Wir demonstrieren dies an einem Leukämiedatensatz.
- Published
- 2007
- Full Text
- View/download PDF
48. Modeling DNA methylation dynamics with approaches from phylogenetics
- Author
-
Capra, John A., primary and Kostka, Dennis, additional
- Published
- 2014
- Full Text
- View/download PDF
49. MicroRNA-17~92 Is Required for Nephrogenesis and Renal Function
- Author
-
Marrone, April K., primary, Stolz, Donna B., additional, Bastacky, Sheldon I., additional, Kostka, Dennis, additional, Bodnar, Andrew J., additional, and Ho, Jacqueline, additional
- Published
- 2014
- Full Text
- View/download PDF
50. Integrating Diverse Datasets Improves Developmental Enhancer Prediction
- Author
-
Erwin, Genevieve D., primary, Oksenberg, Nir, additional, Truty, Rebecca M., additional, Kostka, Dennis, additional, Murphy, Karl K., additional, Ahituv, Nadav, additional, Pollard, Katherine S., additional, and Capra, John A., additional
- Published
- 2014
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.