24 results on '"Yontao Lu"'
Search Results
2. Development and evaluation of a transfusion medicine genome wide genotyping array
- Author
-
Mark Seielstad, Adam S. Butterworth, Stacy M Endres-Dighe, Connie M. Westhoff, Aarash Bordbar, Yontao Lu, Steve Kleinman, Brian Custer, Brendan J. Keating, Carolyn Hoppe, Tamir Kanias, Yuelong Guo, Michael P. Busch, Alan E. Mast, and Grier P. Page
- Subjects
medicine.medical_specialty ,Immunology ,Transfusion medicine ,Genome-wide association study ,Single-nucleotide polymorphism ,Hematology ,Computational biology ,030204 cardiovascular system & hematology ,Biology ,Article ,Human genetics ,Minor allele frequency ,03 medical and health sciences ,0302 clinical medicine ,medicine ,Immunology and Allergy ,Genotyping ,Allele frequency ,Imputation (genetics) ,030215 immunology - Abstract
Background Many aspects of transfusion medicine are affected by genetics. Current single-nucleotide polymorphism (SNP) arrays are limited in the number of targets that can be interrogated and cannot detect all variation of interest. We designed a transfusion medicine array (TM-Array) for study of both common and rare transfusion-relevant variations in genetically diverse donor and recipient populations. Study design and methods The array was designed by conducting extensive bioinformatics mining and consulting experts to identify genes and genetic variation related to a wide range of transfusion medicine clinical relevant and research-related topics. Copy number polymorphisms were added in the alpha globin, beta globin, and Rh gene clusters. Results The final array contains approximately 879,000 SNP and copy number polymorphism markers. Over 99% of SNPs were called reliably. Technical replication showed the array to be robust and reproducible, with an error rate less than 0.03%. The array also had a very low Mendelian error rate (average parent-child trio accuracy of 0.9997). Blood group results were in concordance with serology testing results, and the array accurately identifies rare variants (minor allele frequency of 0.5%). The array achieved high genome-wide imputation coverage for African-American (97.5%), Hispanic (96.1%), East Asian (94.6%), and white (96.1%) genomes at a minor allele frequency of 5%. Conclusions A custom array for transfusion medicine research has been designed and evaluated. It gives wide coverage and accurate identification of rare SNPs in diverse populations. The TM-Array will be useful for future genetic studies in the diverse fields of transfusion medicine research.
- Published
- 2018
3. The Korea Biobank Array: Design and Identification of Coding Variants Associated with Blood Biochemical Traits
- Author
-
Kyungheon Yoon, Yun Kyoung Kim, Young-Jin Kim, Min Young Park, Jae Kyung Park, Yontao Lu, Bong-Jo Kim, Sohee Han, Daesub Song, Sanghoon Moon, Dong Mun Shin, Mi Yeong Hwang, Taejoon Park, Hye-Mi Jang, and Jong Eun Lee
- Subjects
0301 basic medicine ,Adult ,Genotype ,Sequencing data ,Mutation, Missense ,lcsh:Medicine ,Genome-wide association study ,Biology ,Polymorphism, Single Nucleotide ,Article ,03 medical and health sciences ,0302 clinical medicine ,Republic of Korea ,Missense mutation ,Humans ,Alanine aminotransferase ,lcsh:Science ,ALDH2 ,Aged ,Biological Specimen Banks ,Genetics ,Multidisciplinary ,Korean population ,Genome, Human ,lcsh:R ,Genetic Variation ,Reproducibility of Results ,Middle Aged ,Biobank ,030104 developmental biology ,Blood ,Genetic Loci ,lcsh:Q ,030217 neurology & neurosurgery ,Imputation (genetics) ,Genome-Wide Association Study - Abstract
We introduce the design and implementation of a new array, the Korea Biobank Array (referred to as KoreanChip), optimized for the Korean population and demonstrate findings from GWAS of blood biochemical traits. KoreanChip comprised >833,000 markers including >247,000 rare-frequency or functional variants estimated from >2,500 sequencing data in Koreans. Of the 833 K markers, 208 K functional markers were directly genotyped. Particularly, >89 K markers were presented in East Asians. KoreanChip achieved higher imputation performance owing to the excellent genomic coverage of 95.38% for common and 73.65% for low-frequency variants. From GWAS (Genome-wide association study) using 6,949 individuals, 28 associations were successfully recapitulated. Moreover, 9 missense variants were newly identified, of which we identified new associations between a common population-specific missense variant, rs671 (p.Glu457Lys) of ALDH2, and two traits including aspartate aminotransferase (P = 5.20 × 10−13) and alanine aminotransferase (P = 4.98 × 10−8). Furthermore, two novel missense variants of GPT with rare frequency in East Asians but extreme rarity in other populations were associated with alanine aminotransferase (rs200088103; p.Arg133Trp, P = 2.02 × 10−9 and rs748547625; p.Arg143Cys, P = 1.41 × 10−6). These variants were successfully replicated in 6,000 individuals (P = 5.30 × 10−8 and P = 1.24 × 10−6). GWAS results suggest the promising utility of KoreanChip with a substantial number of damaging variants to identify new population-specific disease-associated rare/functional variants.
- Published
- 2019
4. Prediction of epigenetically regulated genes in breast cancer cell lines.
- Author
-
Leandro A. Loss, Anguraj Sadanandam, Steffen Durinck, Shivani Nautiya, Diane Flaucher, Victoria E. H. Carlton, Martin Moorhead, Yontao Lu, Joe W. Gray, Malek Faham, Paul T. Spellman, and Bahram Parvin
- Published
- 2010
- Full Text
- View/download PDF
5. Concept and design of a genome-wide association genotyping array tailored for transplantation-specific studies
- Author
-
Monkol Lek, Samir H Al-Mueilo, Alhusain J. Alzahrani, Kelly A. Thomas, Dimitri S. Monos, Daniel G. MacArthur, Elena Carrigan, Ajay K. Israni, Eyas Mukhtar, Konrad J. Karczewski, Shefali S. Verma, Marylyn D. Ritchie, Brendan J. Keating, Hui Gao, Teresa Webster, Malek Kamoun, Ana Gonzalez, Jessica van Setten, Paul I.W. de Bakker, Laura Steel, Aubree Himes, Kim M. Olthoff, Pamala A. Jacobson, Maede Mohebnasab, Barbara Murphy, Kelsey M. Llyod, Hareesh R. Chandrupatla, Suganthi Balasubramanian, Takesha Lee, James Snyder, Abhinav Gangasani, Baolin Wu, B. Chang, Weihua Guan, Yun Li, Folkert W. Asselbergs, Kelly A. Birdwell, Matthew B. Lanktree, Abraham Shaked, Andrew Pasquier, Cisca Wijmenga, Cuiping Hou, Abigail Colasacco, Chanel Wong, Yontao Lu, Daniel E. McGinn, William S. Oetting, Fahad Al-Muhanna, Amein K. Al-Ali, Abdullah Akdere, Michael B. Miller, Jacob van Houten, David S. Schladt, Hongzhi Cao, Abdullah M. Al-Rubaish, Randy Phillips, Vinicius Tragante, Hakon Hakonarson, Nikhil Nair, Pablo García-Pavía, James Garifallou, Toumy Guettouche, Zach Michaud, Michael V. Holmes, Tiancheng Wang, Reina Yu, and Groningen Institute for Gastro Intestinal Genetics and Immunology (3GI)
- Subjects
SNP ARRAY ,DNA Copy Number Variations ,Genotype ,KIDNEY-TRANSPLANTATION ,Concordance ,Population ,MISMATCH ,Genome-wide association study ,030230 surgery ,Biology ,Research Support ,Polymorphism, Single Nucleotide ,N.I.H ,03 medical and health sciences ,0302 clinical medicine ,Receptors, KIR ,Research Support, N.I.H., Extramural ,HLA Antigens ,MANAGEMENT ,IMPUTATION ,Genetics ,Journal Article ,Humans ,Genetics(clinical) ,International HapMap Project ,education ,Non-U.S. Gov't ,Genotyping ,Molecular Biology ,Genetics (clinical) ,POLYMORPHISMS ,POPULATION ,030304 developmental biology ,0303 health sciences ,education.field_of_study ,Research ,Research Support, Non-U.S. Gov't ,Extramural ,GENE ,3. Good health ,SNP genotyping ,Transplantation ,RECIPIENTS ,REJECTION ,Molecular Medicine ,Imputation (genetics) ,Genome-Wide Association Study - Abstract
Background In addition to HLA genetic incompatibility, non-HLA difference between donor and recipients of transplantation leading to allograft rejection are now becoming evident. We aimed to create a unique genome-wide platform to facilitate genomic research studies in transplant-related studies. We designed a genome-wide genotyping tool based on the most recent human genomic reference datasets, and included customization for known and potentially relevant metabolic and pharmacological loci relevant to transplantation. Methods We describe here the design and implementation of a customized genome-wide genotyping array, the ‘TxArray’, comprising approximately 782,000 markers with tailored content for deeper capture of variants across HLA, KIR, pharmacogenomic, and metabolic loci important in transplantation. To test concordance and genotyping quality, we genotyped 85 HapMap samples on the array, including eight trios. Results We show low Mendelian error rates and high concordance rates for HapMap samples (average parent-parent-child heritability of 0.997, and concordance of 0.996). We performed genotype imputation across autosomal regions, masking directly genotyped SNPs to assess imputation accuracy and report an accuracy of >0.962 for directly genotyped SNPs. We demonstrate much higher capture of the natural killer cell immunoglobulin-like receptor (KIR) region versus comparable platforms. Overall, we show that the genotyping quality and coverage of the TxArray is very high when compared to reference samples and to other genome-wide genotyping platforms. Conclusions We have designed a comprehensive genome-wide genotyping tool which enables accurate association testing and imputation of ungenotyped SNPs, facilitating powerful and cost-effective large-scale genotyping of transplant-related studies. Electronic supplementary material The online version of this article (doi:10.1186/s13073-015-0211-x) contains supplementary material, which is available to authorized users.
- Published
- 2015
6. Genotyping Informatics and Quality Control for 100,000 Subjects in the Genetic Epidemiology Research on Adult Health and Aging (GERA) Cohort
- Author
-
Neil Risch, Pui-Yan Kwok, Yiping Zhan, Stephen K. Van Den Eeden, Eunice Wan, Brad Dispensa, Dana Ludwig, Simon Wong, Gangwu Mei, Michael Mittman, Eric Jorgenson, Sarah Rowell, Tanu Shenoy, Charles P. Quesenberry, Stephanie Hesselson, Lisa A. Croen, Michael H. Shapero, Lawrence Walter, Lawrence H. Kushi, Thomas J. Hoffmann, Teresa Webster, Gurpreet K. Mathauda, Carlos Iribarren, Carol P. Somkin, Yang Cao, Jeremy Gollub, Chia Zau, Sheryl Connell, Mohini Patil, Ling Shen, Marianne Sadler, Mark N. Kvale, Catherine Schaefer, David Chan, Rachel A. Whitmer, Richard Lao, Lori C. Sakoda, Dilrini K. Ranatunga, Andrea Finn, William B. McGuire, David Smethurst, Yontao Lu, Sunita Miles, and Jasmin L Eshragh
- Subjects
Adult ,Male ,Quality Control ,saliva DNA ,Aging ,Genotyping Techniques ,Concordance ,Biology ,Investigations ,Polymorphism, Single Nucleotide ,Cohort Studies ,Genotype ,Genetics ,Humans ,genome-wide genotyping ,Polymorphism ,Genotyping ,Oligonucleotide Array Sequence Analysis ,Molecular Epidemiology ,Human Genome ,Computational Biology ,Single Nucleotide ,SNP genotyping ,Genetic epidemiology ,Affymetrix Axiom ,GERA cohort ,Health ,Cohort ,Female ,Generic health relevance ,Cohort study ,Developmental Biology - Abstract
The Kaiser Permanente (KP) Research Program on Genes, Environment and Health (RPGEH), in collaboration with the University of California—San Francisco, undertook genome-wide genotyping of >100,000 subjects that constitute the Genetic Epidemiology Research on Adult Health and Aging (GERA) cohort. The project, which generated >70 billion genotypes, represents the first large-scale use of the Affymetrix Axiom Genotyping Solution. Because genotyping took place over a short 14-month period, creating a near-real-time analysis pipeline for experimental assay quality control and final optimized analyses was critical. Because of the multi-ethnic nature of the cohort, four different ethnic-specific arrays were employed to enhance genome-wide coverage. All assays were performed on DNA extracted from saliva samples. To improve sample call rates and significantly increase genotype concordance, we partitioned the cohort into disjoint packages of plates with similar assay contexts. Using strict QC criteria, the overall genotyping success rate was 103,067 of 109,837 samples assayed (93.8%), with a range of 92.1–95.4% for the four different arrays. Similarly, the SNP genotyping success rate ranged from 98.1 to 99.4% across the four arrays, the variation depending mostly on how many SNPs were included as single copy vs. double copy on a particular array. The high quality and large scale of genotype data created on this cohort, in conjunction with comprehensive longitudinal data from the KP electronic health records of participants, will enable a broad range of highly powered genome-wide association studies on a diversity of traits and conditions.
- Published
- 2015
7. Additional file 2: Figure S1. of Concept and design of a genome-wide association genotyping array tailored for transplantation-specific studies
- Author
-
Li, Yun, Setten, Jessica Van, Verma, Shefali, Yontao Lu, Holmes, Michael, Gao, Hui, Monkol Lek, Nair, Nikhil, Hareesh Chandrupatla, Baoli Chang, Karczewski, Konrad, Wong, Chanel, Maede Mohebnasab, Eyas Mukhtar, Phillips, Randy, Tragante, Vinicius, Cuiping Hou, Steel, Laura, Takesha Lee, Garifallou, James, Toumy Guettouche, Hongzhi Cao, Weihua Guan, Himes, Aubree, Houten, Jacob Van, Pasquier, Andrew, Yu, Reina, Carrigan, Elena, Miller, Michael, Schladt, David, Akdere, Abdullah, Gonzalez, Ana, Llyod, Kelsey, McGinn, Daniel, Abhinav Gangasani, Michaud, Zach, Colasacco, Abigail, Snyder, James, Thomas, Kelly, Tiancheng Wang, Baolin Wu, Alhusain Alzahrani, Amein Al-Ali, Al-Muhanna, Fahad, Al-Rubaish, Abdullah, Al-Mueilo, Samir, Monos, Dimitri, Murphy, Barbara, Olthoff, Kim, Wijmenga, Cisca, Webster, Teresa, Kamoun, Malek, Suganthi Balasubramanian, Lanktree, Matthew, Oetting, William, Garcia-Pavia, Pablo, MacArthur, Daniel, Bakker, Paul De, Hakon Hakonarson, Birdwell, Kelly, Jacobson, Pamala, Ritchie, Marylyn, Asselbergs, Folkert, Israni, Ajay, Shaked, Abraham, and Keating, Brendan
- Abstract
TxArray transplant-specific modular contents. (PDF 140 kb)
- Published
- 2015
- Full Text
- View/download PDF
8. Additional file 1: Table S1. of Concept and design of a genome-wide association genotyping array tailored for transplantation-specific studies
- Author
-
Li, Yun, Setten, Jessica Van, Verma, Shefali, Yontao Lu, Holmes, Michael, Gao, Hui, Monkol Lek, Nair, Nikhil, Hareesh Chandrupatla, Baoli Chang, Karczewski, Konrad, Wong, Chanel, Maede Mohebnasab, Eyas Mukhtar, Phillips, Randy, Tragante, Vinicius, Cuiping Hou, Steel, Laura, Takesha Lee, Garifallou, James, Toumy Guettouche, Hongzhi Cao, Weihua Guan, Himes, Aubree, Houten, Jacob Van, Pasquier, Andrew, Yu, Reina, Carrigan, Elena, Miller, Michael, Schladt, David, Akdere, Abdullah, Gonzalez, Ana, Llyod, Kelsey, McGinn, Daniel, Abhinav Gangasani, Michaud, Zach, Colasacco, Abigail, Snyder, James, Thomas, Kelly, Tiancheng Wang, Baolin Wu, Alhusain Alzahrani, Amein Al-Ali, Al-Muhanna, Fahad, Al-Rubaish, Abdullah, Al-Mueilo, Samir, Monos, Dimitri, Murphy, Barbara, Olthoff, Kim, Wijmenga, Cisca, Webster, Teresa, Kamoun, Malek, Suganthi Balasubramanian, Lanktree, Matthew, Oetting, William, Garcia-Pavia, Pablo, MacArthur, Daniel, Bakker, Paul De, Hakon Hakonarson, Birdwell, Kelly, Jacobson, Pamala, Ritchie, Marylyn, Asselbergs, Folkert, Israni, Ajay, Shaked, Abraham, and Keating, Brendan
- Abstract
Tagging and coverage of MHC region markers. Table S2: Tagging and coverage of Tx-specific genes. Table S3: Untranslated regions (UTRs) considered in the TxArray design. Table S4: Loss-of-function variants included in the TxArray. Table S5: Copy number polymorphisms (CNPs) and variations (CNVs) included in the TxArray. (DOCX 54 kb)
- Published
- 2015
- Full Text
- View/download PDF
9. The UCSC Genome Browser Database
- Author
-
Mark Diekhans, Yontao Lu, Terrence S. Furey, Michael L. Schwartz, Donna Karolchik, Charles W. Sugnet, Daryl J. Thomas, Krishna M. Roskin, W. J. Kent, Robert Baertsch, David Haussler, R. J. Weber, and Angie S. Hinrichs
- Subjects
Whole genome sequencing ,Database ,Genome, Human ,Flat file database ,Information Storage and Retrieval ,Genomics ,Articles ,Genome browser ,Biology ,computer.software_genre ,Genome ,California ,Mice ,ComputingMethodologies_PATTERNRECOGNITION ,Databases, Genetic ,Data file ,Genetics ,Animals ,Database Management Systems ,Humans ,DECIPHER ,Human genome ,computer - Abstract
The University of California Santa Cruz (UCSC) Genome Browser Database is an up to date source for genome sequence data integrated with a large collection of related annotations. The database is optimized to support fast interactive performance with the web-based UCSC Genome Browser, a tool built on top of the database for rapid visualization and querying of the data at many levels. The annotations for a given genome are displayed in the browser as a series of tracks aligned with the genomic sequence. Sequence data and annotations may also be viewed in a text-based tabular format or downloaded as tab-delimited flat files. The Genome Browser Database, browsing tools and downloadable data files can all be found on the UCSC Genome Bioinformatics website (http://genome.ucsc.edu), which also contains links to documentation and related technical information.
- Published
- 2003
10. Accuracy of profiling of circulating tumor DNA for CRC MRD and monitoring using NGS technology equipped with concatemer-based error correction
- Author
-
Yontao Lu, Grace Q. Zhao, Lingchen Guo, Zhaohui Johnny Sun, Kang Ying, Ming Li, Shengrong Lin, Paul Tang, Yingyu Wang, Yi Huang, Malek Faham, Dana Yeo, Xin-Xing Li, Zhiqian Hu, Li Weng, WeiWei Xiao, Min Li, and Hongyan Wang
- Subjects
Cancer Research ,chemistry.chemical_compound ,Oncology ,chemistry ,Circulating tumor DNA ,business.industry ,Concatemer ,Cancer research ,Medicine ,In patient ,business ,Minimal residual disease - Abstract
e23067 Background: Circulating tumor DNA (ctDNA) is a promising biomarker for the detection of minimal residual disease and monitoring treatment in patients with CRC. The performance demands of any technology used for this purpose, however, are tremendous. Here we aim to develop a high-performance multiplex NGS platform suitable for cancer MRD using ctDNA. Methods: We have developed Firefly, a NGS method capable of detecting low-frequency variants with high precision in plasma cfDNA. In our protocol, denatured double-stranded cfDNA is circularized and converted into long tandem repeats using rolling-circle amplification enabling consensus-based concatemer error correction. We demonstrated Firefly’s performance sensitivity and specificity by testing our technology on cfDNA samples with known variant frequencies and cfDNA collected from healthy individuals (n = 82). Further analysis of Firefly as a tool for MRD and treatment monitoring was performed by tracking ctDNA mutation profile concordance between 81 CRC tumor samples and their corresponding plasma samples collected from patients before and after treatment. Results: Performance sensitivity of Firefly NGS was 0.1% with an error-rate was 1 in 1 Million for 20ng of input ctDNA. Concordance analysis was performed on CRC tumor/plasma pairings derived from patients with CRC using, Accu-Act, a 61-gene assay. The number of tumor-matching mutations detected in plasma varied greatly on a per-patient basis (range, 0-28). Pre and post-treatment ctDNA profiling was performed on all 81 patients included in our study (surgery, n = 56; chemotherapy/radiotherapy, n = 30). Among patients who underwent surgery, 46% had detectable tumor-matching mutations in their plasma. Among patients who received neoadjuvant therapy, 70% ctDNA fluctuations consistent with tumor reduction based on surgical tumor regression grades evaluation(TRG1-3). Conclusions: We report a novel ultra-accurate NGS-based ctDNA assay suitable for MRD and monitoring in CRC patients. Firefly should ultimately make a significant contribution in the development of personalized cancer treatment.
- Published
- 2017
11. The Diversity of REcent and Ancient huMan (DREAM): A New Microarray for Genetic Anthropology and Genealogy, Forensics, and Personalized Medicine.
- Author
-
Elhaik, Eran, Yusuf, Leeban, Anderson, Ainan I. J., Pirooznia, Mehdi, Arnellos, Dimitrios, Vilshansky, Gregory, Ercal, Gunes, Yontao Lu, Webster, Teresa, Baird, Michael L., and Esposito, Umberto
- Subjects
GENETIC polymorphisms ,HUMAN population genetics ,ANTHROPOLOGY ,GENEALOGY ,DNA analysis - Abstract
The human population displays wide variety in demographic history, ancestry, content of DNA derived from hominins or ancient populations, adaptation, traits, copy number variation, drug response, and more. These polymorphisms are of broad interest to population geneticists, forensics investigators, and medical professionals. Historically, much of that knowledge was gained from population survey projects. Although many commercial arrays exist for genome-wide single-nucleotide polymorphism genotyping, their design specifications are limited and they do not allow a full exploration of biodiversity.We thereby aimed to design the Diversity of REcent and Ancient huMan (DREAM)--an all-inclusive microarray that would allow both identification of known associations and exploration of standing questions in genetic anthropology, forensics, and personalized medicine. DREAM includes probes to interrogate ancestry informative markers obtained from over 450 human populations, over 200 ancient genomes, and 10 archaic hominins. DREAMcan identify 94% and 61% of all known Y and mitochondrial haplogroups, respectively, and was vetted to avoid interrogation of clinically relevant markers. To demonstrate its capabilities, we compared its F
ST distributions with those of the 1000 Genomes Project and commercial arrays. Although all arrays yielded similarly shaped (inverse J) FST distributions, DREAM's autosomal and X-chromosomal distributions had the highest mean FST , attesting to its ability to discern subpopulations. DREAM performances are further illustrated in biogeographical, identical by descent, and copy number variation analyses. In summary, with approximately 800,000 markers spanning nearly 2,000 genes, DREAM is a useful tool for genetic anthropology, forensic, and personalized medicine studies. [ABSTRACT FROM AUTHOR]- Published
- 2017
- Full Text
- View/download PDF
12. High-throughput method for analyzing methylation of CpGs in targeted genomic regions
- Author
-
Paul Berg, Michael N. Mindrinos, James S. Ireland, Joe W. Gray, Martin Moorhead, Diane Flaucher, Yontao Lu, Malek Faham, Victoria Carlton, Paul T. Spellman, and Shivani Nautiyal
- Subjects
Genetics ,Multidisciplinary ,Genome ,Promoter ,Cell Differentiation ,Methylation ,DNA ,Biology ,Biological Sciences ,DNA Methylation ,Housekeeping gene ,chemistry.chemical_compound ,CpG site ,chemistry ,DNA methylation ,Illumina Methylation Assay ,Humans ,RNA-Directed DNA Methylation ,Dinucleoside Phosphates - Abstract
A unique microarray-based method for determining the extent of DNA methylation has been developed. It relies on a selective enrichment of the regions to be assayed by target amplification by capture and ligation (mTACL). The assay is quantitatively accurate, relatively precise, and lends itself to high-throughput determination using nanogram amounts of DNA. The measurements using mTACLs are highly reproducible and in excellent agreement with those obtained by sequencing ( r = 0.94). In the present work, the methylation status of >145,000 CpGs from 5,472 promoters in 221 samples was measured. The methylation levels of nearby CpGs are correlated, but the correlation falls off dramatically over several hundred base pairs. In some instances, nearby CpGs have very different levels of methylation. Comparison of normal and tumor samples indicates that in tumors, the promoter regions of genes involved in differentiation and signaling are preferentially hypermethylated, whereas those of housekeeping genes remain hypomethylated. mTACL is a platform for profiling the state of methylation of a large number of CpG in many samples in a cost-effective fashion, and is capable of scaling to much larger numbers of CpGs than those collected here.
- Published
- 2010
13. Prediction of epigenetically regulated genes in breast cancer cell lines
- Author
-
Steffen Durinck, Joe W. Gray, Diane Flaucher, Bahram Parvin, Paul T. Spellman, Anguraj Sadanandam, Yontao Lu, Malek Faham, Shivani Nautiyal, Leandro A. Loss, Victoria Carlton, and Martin Moorhead
- Subjects
Breast Neoplasms ,Biology ,Methodology article ,lcsh:Computer applications to medicine. Medical informatics ,Biochemistry ,Collagen Type I ,Epigenesis, Genetic ,03 medical and health sciences ,0302 clinical medicine ,Antigens, Neoplasm ,Structural Biology ,Cell Line, Tumor ,Humans ,Genes, Tumor Suppressor ,Epigenetics ,Poly-ADP-Ribose Binding Proteins ,Promoter Regions, Genetic ,Proto-Oncogene Proteins c-vav ,lcsh:QH301-705.5 ,Molecular Biology ,Oligonucleotide Array Sequence Analysis ,030304 developmental biology ,Regulation of gene expression ,Genetics ,0303 health sciences ,Gene Expression Profiling ,Tumor Suppressor Proteins ,Applied Mathematics ,Promoter ,Methylation ,DNA Methylation ,3. Good health ,Computer Science Applications ,Collagen Type I, alpha 1 Chain ,DNA-Binding Proteins ,Gene Expression Regulation, Neoplastic ,Gene expression profiling ,DNA Topoisomerases, Type II ,lcsh:Biology (General) ,CpG site ,030220 oncology & carcinogenesis ,DNA methylation ,lcsh:R858-859.7 ,CpG Islands ,Trefoil Factor-1 ,DNA microarray ,Genome-Wide Association Study - Abstract
Background Methylation of CpG islands within the DNA promoter regions is one mechanism that leads to aberrant gene expression in cancer. In particular, the abnormal methylation of CpG islands may silence associated genes. Therefore, using high-throughput microarrays to measure CpG island methylation will lead to better understanding of tumor pathobiology and progression, while revealing potentially new biomarkers. We have examined a recently developed high-throughput technology for measuring genome-wide methylation patterns called mTACL. Here, we propose a computational pipeline for integrating gene expression and CpG island methylation profles to identify epigenetically regulated genes for a panel of 45 breast cancer cell lines, which is widely used in the Integrative Cancer Biology Program (ICBP). The pipeline (i) reduces the dimensionality of the methylation data, (ii) associates the reduced methylation data with gene expression data, and (iii) ranks methylation-expression associations according to their epigenetic regulation. Dimensionality reduction is performed in two steps: (i) methylation sites are grouped across the genome to identify regions of interest, and (ii) methylation profles are clustered within each region. Associations between the clustered methylation and the gene expression data sets generate candidate matches within a fxed neighborhood around each gene. Finally, the methylation-expression associations are ranked through a logistic regression, and their significance is quantified through permutation analysis. Results Our two-step dimensionality reduction compressed 90% of the original data, reducing 137,688 methylation sites to 14,505 clusters. Methylation-expression associations produced 18,312 correspondences, which were used to further analyze epigenetic regulation. Logistic regression was used to identify 58 genes from these correspondences that showed a statistically signifcant negative correlation between methylation profles and gene expression in the panel of breast cancer cell lines. Subnetwork enrichment of these genes has identifed 35 common regulators with 6 or more predicted markers. In addition to identifying epigenetically regulated genes, we show evidence of differentially expressed methylation patterns between the basal and luminal subtypes. Conclusions Our results indicate that the proposed computational protocol is a viable platform for identifying epigenetically regulated genes. Our protocol has generated a list of predictors including COL1A2, TOP2A, TFF1, and VAV3, genes whose key roles in epigenetic regulation is documented in the literature. Subnetwork enrichment of these predicted markers further suggests that epigenetic regulation of individual genes occurs in a coordinated fashion and through common regulators.
- Published
- 2010
14. Pseudogenes in the ENCODE regions: consensus annotation, analysis of transcription, and evolution
- Author
-
Stylianos E. Antonarakis, Yontao Lu, Philipp Kapranov, Thomas R. Gingeras, Roderic Guigó, Alexandre Reymond, Michael Snyder, Yijun Ruan, Mark Gerstein, Chia-Lin Wei, Robert Baertsch, Siew Woh Choo, Adam Frankish, Jennifer Harrow, and Deyou Zheng
- Subjects
Primates ,Letter ,Retroelements ,Transcription, Genetic ,Primates/genetics ,Pseudogene ,Retrotransposon ,Computational biology ,Biology ,ENCODE ,Genome ,DNA sequencing ,Cell Line ,Evolution, Molecular ,Species Specificity ,Gene Duplication ,Gene duplication ,Genetics ,Animals ,Humans ,Gene ,Genetics (clinical) ,ddc:616 ,Computational genomics ,Genètica evolutiva ,Sequence Analysis, DNA ,Factors de transcripció ,Pseudogenes - Abstract
Arising from either retrotransposition or genomic duplication of functional genes, pseudogenes are “genomic fossils” valuable for exploring the dynamics and evolution of genes and genomes. Pseudogene identification is an important problem in computational genomics, and is also critical for obtaining an accurate picture of a genome’s structure and function. However, no consensus computational scheme for defining and detecting pseudogenes has been developed thus far. As part of the ENCyclopedia Of DNA Elements (ENCODE) project, we have compared several distinct pseudogene annotation strategies and found that different approaches and parameters often resulted in rather distinct sets of pseudogenes. We subsequently developed a consensus approach for annotating pseudogenes (derived from protein coding genes) in the ENCODE regions, resulting in 201 pseudogenes, two-thirds of which originated from retrotransposition. A survey of orthologs for these pseudogenes in 28 vertebrate genomes showed that a significant fraction (∼80%) of the processed pseudogenes are primate-specific sequences, highlighting the increasing retrotransposition activity in primates. Analysis of sequence conservation and variation also demonstrated that most pseudogenes evolve neutrally, and processed pseudogenes appear to have lost their coding potential immediately or soon after their emergence. In order to explore the functional implication of pseudogene prevalence, we have extensively examined the transcriptional activity of the ENCODE pseudogenes. We performed systematic series of pseudogene-specific RACE analyses. These, together with complementary evidence derived from tiling microarrays and high throughput sequencing, demonstrated that at least a fifth of the 201 pseudogenes are transcribed in one or more cell lines or tissues.
- Published
- 2007
15. Analysis of human mRNAs with the reference genome sequence reveals potential errors, polymorphisms, and RNA editing
- Author
-
David Haussler, Lachlan G. Oddy, Richard K. Wilson, Jennifer Randall-Maher, Yontao Lu, LaDeana W. Hillier, Tina Graves, Mark Diekhans, and Terrence S. Furey
- Subjects
Comparative genomics ,Genetics ,Expressed Sequence Tags ,Genome evolution ,Polymorphism, Genetic ,Genome, Human ,Computational Biology ,Genetic Variation ,Genome project ,Sequence Analysis, DNA ,Biology ,Genome survey sequence ,Sequence-tagged site ,RNA editing ,Human Genome Project ,Humans ,Human genome ,RNA Editing ,RNA, Messenger ,Letters ,Genetics (clinical) ,Reference genome - Abstract
The NCBI Reference Sequence (RefSeq) project and the NIH Mammalian Gene Collection (MGC) together define a set of approximately 30,000 nonredundant human mRNA sequences with identified coding regions representing 17,000 distinct loci. These high-quality mRNA sequences allow for the identification of transcribed regions in the human genome sequence, and many researchers accept them as the correct representation of each defined gene sequence. Computational comparison of these mRNA sequences and the recently published essentially finished human genome sequence reveals several thousand undocumented nonsynonymous substitution and frame shift discrepancies between the two resources. Additional analysis is undertaken to verify that the euchromatic human genome is sufficiently complete--containing nearly the whole mRNA collection, thus allowing for a comprehensive analysis to be undertaken. Many of the discrepancies will prove to be genuine polymorphisms in the human population, somatic cell genomic variants, or examples of RNA editing. It is observed that the genome sequence variant has significant additional support from other mRNAs and ESTs, almost four times more often than does the mRNA variant, suggesting that the genome sequence is more accurate. In approximately 15% of these cases, there is substantial support for both variants, suggestive of an undocumented polymorphism. An initial screening against a 24-individual genomic DNA diversity panel verified 60% of a small set of potential single nucleotide polymorphisms from which successful results could be obtained. We also find statistical evidence that a few of these discrepancies are due to RNA editing. Overall, these results suggest that the mRNA collections may contain a substantial number of errors. For current and future mRNA collections, it may be prudent to fully reconcile each genome sequence discrepancy, classifying each as a polymorphism, site of RNA editing or somatic cell variation, or genome sequence error.
- Published
- 2004
16. Comparative recombination rates in the rat, mouse, and human genomes
- Author
-
Michael I. Jensen-Seaman, Michael A. Thomas, Krishna M. Roskin, Chin-Fu Chen, David Haussler, Yontao Lu, Terrence S. Furey, Howard J. Jacob, and Bret A. Payseur
- Subjects
Pseudoautosomal region ,Non-allelic homologous recombination ,Mice, Obese ,Mice, Inbred Strains ,Biology ,Genome ,Chromosomes ,Evolution, Molecular ,Mice ,Species Specificity ,Rats, Inbred BN ,Rats, Inbred SHR ,Genetics ,Animals ,Humans ,Genetics (clinical) ,X chromosome ,Crosses, Genetic ,Synteny ,Recombination, Genetic ,Base Composition ,Genome, Human ,Chromosome ,Genetic Variation ,Articles ,Rats ,Human genome ,Recombination - Abstract
Levels of recombination vary among species, among chromosomes within species, and among regions within chromosomes in mammals. This heterogeneity may affect levels of diversity, efficiency of selection, and genome composition, as well as have practical consequences for the genetic mapping of traits. We compared the genetic maps to the genome sequence assemblies of rat, mouse, and human to estimate local recombination rates across these genomes. Humans have greater overall levels of recombination, as well as greater variance. In rat and mouse, the size of the chromosome and proximity to telomere have less effect on local recombination rate than in human. At the chromosome level, rat and mouse X chromosomes have the lowest recombination rates, whereas human chromosome X does not show the same pattern. In all species, local recombination rate is significantly correlated with several sequence variables, including GC%, CpG density, repetitive elements, and the neutral mutation rate, with some pronounced differences between species. Recombination rate in one species is not strongly correlated with the rate in another, when comparing homologous syntenic blocks of the genome. This comparative approach provides additional insight into the causes and consequences of genomic heterogeneity in recombination.
- Published
- 2004
17. Concept and design of a genome-wide association genotyping array tailored for transplantation-specific studies.
- Author
-
Yun R. Li, van Setten, Jessica, Verma, Shefali S., Yontao Lu, Holmes, Michael V., Hui Gao, Lek, Monkol, Nair, Nikhil, Chandrupatla, Hareesh, Baoli Chang, Karczewski, Konrad J., Wong, Chanel, Mohebnasab, Maede, Mukhtar, Eyas, Phillips, Randy, Tragante, Vinicius, Cuiping Hou, Steel, Laura, Takesha Lee, and Garifallou, James
- Subjects
GENOTYPES ,HUMAN genome ,HOMOGRAFTS ,PHARMACOGENOMICS ,SINGLE nucleotide polymorphisms ,KILLER cells - Abstract
Background: In addition to HLA genetic incompatibility, non-HLA difference between donor and recipients of transplantation leading to allograft rejection are now becoming evident. We aimed to create a unique genome-wide platform to facilitate genomic research studies in transplant-related studies. We designed a genome-wide genotyping tool based on the most recent human genomic reference datasets, and included customization for known and potentially relevant metabolic and pharmacological loci relevant to transplantation. Methods: We describe here the design and implementation of a customized genome-wide genotyping array, the TxArray', comprising approximately 782,000 markers with tailored content for deeper capture of variants across HLA, KIR, pharmacogenomic, and metabolic loci important in transplantation. To test concordance and genotyping quality, we genotyped 85 HapMap samples on the array, including eight trios. Results: We show low Mendelian error rates and high concordance rates for HapMap samples (average parent-parent-child heritability of 0.997, and concordance of 0.996). We performed genotype imputation across autosomal regions, masking directly genotyped SNPs to assess imputation accuracy and report an accuracy of >0.962 for directly genotyped SNPs. We demonstrate much higher capture of the natural killer cell immunoglobulin-like receptor (KIR) region versus comparable platforms. Overall, we show that the genotyping quality and coverage of the TxArray is very high when compared to reference samples and to other genome-wide genotyping platforms. Conclusions: We have designed a comprehensive genome-wide genotyping tool which enables accurate association testing and imputation of ungenotyped SNPs, facilitating powerful and cost-effective large-scale genotyping of transplant-related studies. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
18. High-throughput method for analyzing methylation of CpGs in targeted genomic regions.
- Author
-
Nautiyal, Shivani, Carlton, Victoria E. H., Yontao Lu, Ireland, James S., Flaucher, Diane, Moorhead, Martin, Gray, Joe W., Spellman, Paul, Mindrinos, Michael, Berg, Paul, and Faham, Malek
- Subjects
METHYLATION ,NUCLEIC acids ,GENES ,HEREDITY ,DNA - Abstract
A unique microarray-based method for determining the extent of DNA methylation has been developed. It relies on a selective enrichment of the regions to be assayed by target amplification by capture and ligation (mTACL). The assay is quantitatively accurate, relatively precise, and lends itself to high-throughput determination using nanogram amounts of DNA. The measurements using mTACLs are highly reproducible and in excellent agreement with those obtained by sequencing (r = 0.94). In the present work, the methylation status of >145,000 CpGs from 5,472 promoters in 221 samples was measured. The methylation levels of nearby CpGs are correlated, but the correlation falls off dramatically over several hundred base pairs. In some instances, nearby CpGs have very different levels of methylation. Comparison of normal and tumor samples indicates that in tumors, the promoter regions of genes involved in differentiation and signaling are preferentially hypermethylated, whereas those of housekeeping genes remain hypomethylated. mTACL is a platform for profiling the state of methylation of a large number of CpG in many samples in a cost-effective fashion, and is capable of scaling to much larger numbers of CpGs than those collected here. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
19. Prediction of epigenetically regulated genes inbreast cancer cell lines.
- Author
-
Loss, Leandro A., Sadanandam, Anguraj, Durinck, Steffen, Nautiyal, Shivani, Flaucher, Diane, Carlton, Victoria E. H., Moorhead, Martin, Yontao Lu, Gray, Joe W., Faham, Malek, Spellman, Paul, and Parvin, Bahram
- Subjects
CANCER cells ,CELL lines ,METHYLATION ,DNA ,GENE expression - Abstract
Background: Methylation of CpG islands within the DNA promoter regions is one mechanism that leads to aberrant gene expression in cancer. In particular, the abnormal methylation of CpG islands may silence associated genes. Therefore, using high-throughput microarrays to measure CpG island methylation will lead to better understanding of tumor pathobiology and progression, while revealing potentially new biomarkers. We have examined a recently developed high-throughput technology for measuring genome-wide methylation patterns called mTACL. Here, we propose a computational pipeline for integrating gene expression and CpG island methylation profles to identify epigenetically regulated genes for a panel of 45 breast cancer cell lines, which is widely used in the Integrative Cancer Biology Program (ICBP). The pipeline (i) reduces the dimensionality of the methylation data, (ii) associates the reduced methylation data with gene expression data, and (iii) ranks methylation-expression associations according to their epigenetic regulation. Dimensionality reduction is performed in two steps: (i) methylation sites are grouped across the genome to identify regions of interest, and (ii) methylation profles are clustered within each region. Associations between the clustered methylation and the gene expression data sets generate candidate matches within a fxed neighborhood around each gene. Finally, the methylation-expression associations are ranked through a logistic regression, and their significance is quantified through permutation analysis. Results: Our two-step dimensionality reduction compressed 90% of the original data, reducing 137,688 methylation sites to 14,505 clusters. Methylation-expression associations produced 18,312 correspondences, which were used to further analyze epigenetic regulation. Logistic regression was used to identify 58 genes from these correspondences that showed a statistically signifcant negative correlation between methylation profles and gene expression in the panel of breast cancer cell lines. Subnetwork enrichment of these genes has identifed 35 common regulators with 6 or more predicted markers. In addition to identifying epigenetically regulated genes, we show evidence of differentially expressed methylation patterns between the basal and luminal subtypes. Conclusions: Our results indicate that the proposed computational protocol is a viable platform for identifying epigenetically regulated genes. Our protocol has generated a list of predictors including COL1A2, TOP2A, TFF1, and VAV3, genes whose key roles in epigenetic regulation is documented in the literature. Subnetwork enrichment of these predicted markers further suggests that epigenetic regulation of individual genes occurs in a coordinated fashion and through common regulators. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
20. Design and coverage of high throughput genotyping arrays optimized for individuals of East Asian, African American, and Latino race/ethnicity using imputation and a novel hybrid SNP selection algorithm
- Author
-
Stephen K. Van Den Eeden, Michael H. Shapero, Larry Walter, Teresa Webster, Neil Risch, Mark N. Kvale, Gangwu Mei, Matthew M. Purdy, Sarah Rowell, Stephanie Hesselson, Jeremy Gollub, Catherine Schaefer, Pui-Yan Kwok, David Smethurst, Carlos Iribarren, Yiping Zhan, Thomas J. Hoffmann, Rachel A. Whitmer, Carol P. Somkin, Charles P. Quesenberry, Andrea Finn, and Yontao Lu
- Subjects
Genome-wide association study ,Coverage ,Genotype ,Pilot Projects ,Single-nucleotide polymorphism ,Computational biology ,Biology ,Microarray ,Polymorphism, Single Nucleotide ,White People ,Article ,03 medical and health sciences ,0302 clinical medicine ,Asian People ,Genetics ,Humans ,SNP ,Genotyping ,Selection algorithm ,Oligonucleotide Array Sequence Analysis ,030304 developmental biology ,Imputation ,0303 health sciences ,Asia, Eastern ,Genome, Human ,Hispanic or Latino ,Throughput ,Single nucleotide polymorphism ,Black or African American ,Pairwise comparison ,Human genome ,Algorithms ,030217 neurology & neurosurgery ,Imputation (genetics) - Abstract
Four custom Axiom genotyping arrays were designed for a genome-wide association (GWA) study of 100,000 participants from the Kaiser Permanente Research Program on Genes, Environment and Health. The array optimized for individuals of European race/ethnicity was previously described. Here we detail the development of three additional microarrays optimized for individuals of East Asian, African American, and Latino race/ethnicity. For these arrays, we decreased redundancy of high-performing SNPs to increase SNP capacity. The East Asian array was designed using greedy pairwise SNP selection. However, removing SNPs from the target set based on imputation coverage is more efficient than pairwise tagging. Therefore, we developed a novel hybrid SNP selection method for the African American and Latino arrays utilizing rounds of greedy pairwise SNP selection, followed by removal from the target set of SNPs covered by imputation. The arrays provide excellent genome-wide coverage and are valuable additions for large-scale GWA studies.
- Full Text
- View/download PDF
21. Next generation genome-wide association tool: Design and coverage of a high-throughput European-optimized SNP array
- Author
-
Chia Zau, Charles P. Quesenberry, Tanushree R. Shenoy, Jasmin L Eshragh, Jay Kaufman, Wen Cc, Yang Cao, Simon Wong, Carlos Iribarren, Gurpreet K. Mathauda, Li Weng, Eunice Wan, Christine Aquino, Alan Williams, Larry Walter, Sheryl Connell, Simon Cawley, Ling Shen, Stephen K. Van Den Eeden, Andrea Finn, Rachel A. Whitmer, Dilrini K. Ranatunga, Richard Lao, Yontao Lu, William B. McGuire, Dana Ludwig, Thomas J. Hoffmann, Pui-Yan Kwok, David Smethurst, Sarah Rowell, Stephanie Hesselson, Mary Henderson, Yiping Zhan, Marcia Ewing, Earl Hubbell, Sunita Miles, Marianne Sadler, Matthew M. Purdy, Mark N. Kvale, Gangwu Mei, Catherine Schaefer, Michael H. Shapero, Teresa Webster, Neil Risch, Reid Wearley, Elaine Chung, and Jeremy Gollub
- Subjects
Genome-wide association study ,Coverage ,Single-nucleotide polymorphism ,Computational biology ,Biology ,Microarray ,Polymorphism, Single Nucleotide ,Article ,White People ,03 medical and health sciences ,0302 clinical medicine ,Genetics ,Humans ,SNP ,Genotyping ,Oligonucleotide Array Sequence Analysis ,030304 developmental biology ,Genetic association ,0303 health sciences ,Tool design ,Throughput ,High-Throughput Screening Assays ,Single nucleotide polymorphism ,030217 neurology & neurosurgery ,Imputation (genetics) ,SNP array - Abstract
The success of genome-wide association studies has paralleled the development of efficient genotyping technologies. We describe the development of a next-generation microarray based on the new highly-efficient Affymetrix Axiom genotyping technology that we are using to genotype individuals of European ancestry from the Kaiser Permanente Research Program on Genes, Environment and Health (RPGEH). The array contains 674,517 SNPs, and provides excellent genome-wide as well as gene-based and candidate-SNP coverage. Coverage was calculated using an approach based on imputation and cross validation. Preliminary results for the first 80,301 saliva-derived DNA samples from the RPGEH demonstrate very high quality genotypes, with sample success rates above 94% and over 98% of successful samples having SNP call rates exceeding 98%. At steady state, we have produced 462 million genotypes per week for each Axiom system. The new array provides a valuable addition to the repertoire of tools for large scale genome-wide association studies.
- Full Text
- View/download PDF
22. Analysis of Human mRNAs With the Reference Genome Sequence Reveals Potential Errors, Polymorphisms, and RNA Editing.
- Author
-
Furey, Terrence S., Diekhans, Mark, Yontao Lu, Graves, Tina A., Oddy, LachIan, Randall-Maher, Jennifer, Hillier, LaDeana W., Wilson, Richard K., and Haussler, David
- Subjects
- *
MESSENGER RNA , *HUMAN genome , *GENETIC polymorphisms , *GENETIC transcription , *DNA , *CELLS - Abstract
The NCBI Reference Sequence (RefSeq) project and the NIH Mammalian Gene Collection (MGC) together define a set of ∼30,000 nonredundant human mRNA sequences with identified coding regions representing 17,000 distinct loci. These high-quality mRNA sequences allow for the identification of transcribed regions in the human genome sequence, and many researchers accept them as the correct representation of each defined gene sequence. Computational comparison of these mRNA sequences and the recently published essentially finished human genome sequence reveals several thousand undocumented nonsynonymous substitution and frame shift discrepancies between the two resources. Additional analysis is undertaken to verify that the euchromatic human genome is sufficiently complete—containing nearly the whole mRNA collection, thus allowing for a comprehensive analysis to be undertaken. Many of the discrepancies will prove to be genuine polymorphisms in the human population, somatic cell genomic variants, or examples of RNA editing. It is observed that the genome sequence variant has significant additional support from other mRNAs and ESTs, almost four times more often than does the mRNA variant, suggesting that the genome sequence is more accurate. In ∼15% of these cases, there is substantial support for both variants, suggestive of an undocumented polymorphism. An initial screening against a 24-individual genomic DNA diversity panel verified 60% of a small set of potential single nucleotide polymorphisms from which successful results could be obtained. We also find statistical evidence that a few of these discrepancies are due to RNA editing. Overall, these results suggest that the mRNA collections may contain a substantial number of errors. For current and future mRNA collections, it may be prudent to fully reconcile each genome sequence discrepancy, classifying each as a polymorphism, site of RNA editing or somatic cell variation, or genome sequence error. [ABSTRACT FROM AUTHOR]
- Published
- 2004
- Full Text
- View/download PDF
23. Genotyping Informatics and Quality Control for 100,000 Subjects in the Genetic Epidemiology Research on Adult Health and Aging (GERA) Cohort.
- Author
-
Kvale, Mark N., Hesselson, Stephanie, Hoffmann, Thomas J., Yang Cao, Chan, David, Connell, Sheryl, Croen, Lisa A., Dispensa, Brad P., Eshragh, Jasmin, Finn, Andrea, Gollub, Jeremy, Iribarren, Carlos, Jorgenson, Eric, Kushi, Lawrence H., Lao, Richard, Yontao Lu, Ludwig, Dana, Mathauda, Gurpreet K., McGuire, William B., and Gangwu Mei
- Subjects
- *
GENOTYPES , *QUALITY control , *GENETIC epidemiology , *HEALTH of adults , *SINGLE nucleotide polymorphisms , *ELECTRONIC health records - Abstract
The Kaiser Permanente (KP) Research Program on Genes, Environment and Health (RPGEH), in collaboration with the University of California—San Francisco, undertook genome-wide genotyping of .100,000 subjects that constitute the Genetic Epidemiology Research on Adult Health and Aging (GERA) cohort. The project, which generated .70 billion genotypes, represents the first large-scale use of the Affymetrix Axiom Genotyping Solution. Because genotyping took place over a short 14-month period, creating a near-real-time analysis pipeline for experimental assay quality control and final optimized analyses was critical. Because of the multi-ethnic nature of the cohort, four different ethnic-specific arrays were employed to enhance genome-wide coverage. All assays were performed on DNA extracted from saliva samples. To improve sample call rates and significantly increase genotype concordance, we partitioned the cohort into disjoint packages of plates with similar assay contexts. Using strict QC criteria, the overall genotyping success rate was 103,067 of 109,837 samples assayed (93.8%), with a range of 92.1–95.4% for the four different arrays. Similarly, the SNP genotyping success rate ranged from 98.1 to 99.4% across the four arrays, the variation depending mostly on how many SNPs were included as single copy vs. double copy on a particular array. The high quality and large scale of genotype data created on this cohort, in conjunction with comprehensive longitudinal data from the KP electronic health records of participants, will enable a broad range of highly powered genome-wide association studies on a diversity of traits and conditions. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
24. Pseudogenes in the ENCODE regions: Consensus annotation, analysis of transcription, and evolution.
- Author
-
Deyou Zheng, Frankish, Adam, Baertsch, Robert, Kapranov, Philipp, Reymond, Alexandre, Siew Woh Choo, Yontao Lu, Denoeud, France, Antonarakis, Stylianos E., Snyder, Michael, Ruan, Yijun, Chia-Lin Wei, Gingeras, Thomas R., Guigó, Roderic, Harrow, Jennifer, and Gerstein, Mark B.
- Subjects
- *
DNA replication , *GENES , *GENOMES , *GENOMICS , *EXONS (Genetics) , *GENETIC transcription , *CELL lines - Abstract
Arising from either retrotransposition or genomic duplication of functional genes, pseudogenes are "genomic fossils" valuable for exploring the dynamics and evolution of genes and genomes. Pseudogene identification is an important problem in computational genomics, and is also critical for obtaining an accurate picture of a genome's structure and function. However, no consensus computational scheme for defining and detecting pseudogenes has been developed thus far. As part of the ENCyclopedia Of DNA Elements (ENCODE) project, we have compared several distinct pseudogene annotation strategies and found that different approaches and parameters often resulted in rather distinct sets of pseudogenes. We subsequently developed a consensus approach for annotating pseudogenes (derived from protein coding genes) in the ENCODE regions, resulting in 201 pseudogenes, two-thirds of which originated from retrotransposition. A survey of orthologs for these pseudogenes in 28 vertebrate genomes showed that a significant fraction (-80%) of the processed pseudogenes are primate-specific sequences, highlighting the increasing retrotransposition activity in primates. Analysis of sequence conservation and variation also demonstrated that most pseudogenes evolve neutrally, and processed pseudogenes appear to have lost their coding potential immediately or soon after their emergence. In order to explore the functional implication of pseudogene prevalence, we have extensively examined the transcriptional activity of the ENCODE pseudogenes. We performed systematic series of pseudogene-specific RACE analyses. These, together with complementary evidence derived from tiling microarrays and high throughput sequencing, demonstrated that at least a fifth of the 201 pseudogenes are transcribed in one or more cell lines or tissues. [ABSTRACT FROM AUTHOR]
- Published
- 2007
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.