115 results on '"Su Zhenqiang"'
Search Results
2. Knowledge enhancement and scene understanding for knowledge-based visual question answering
- Author
-
Su, Zhenqiang and Gou, Gang
- Published
- 2024
- Full Text
- View/download PDF
3. Constructing a robust protein-protein interaction network by integrating multiple public databases
- Author
-
Ding Don, Tong Weida, Fang Hong, Ye Yanbin, Su Zhenqiang, Guo Li, Liu Zhichao, Martha Venkata-Swamy, and Xu Xiaowei
- Subjects
Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background Protein-protein interactions (PPIs) are a critical component for many underlying biological processes. A PPI network can provide insight into the mechanisms of these processes, as well as the relationships among different proteins and toxicants that are potentially involved in the processes. There are many PPI databases publicly available, each with a specific focus. The challenge is how to effectively combine their contents to generate a robust and biologically relevant PPI network. Methods In this study, seven public PPI databases, BioGRID, DIP, HPRD, IntAct, MINT, REACTOME, and SPIKE, were used to explore a powerful approach to combine multiple PPI databases for an integrated PPI network. We developed a novel method called k-votes to create seven different integrated networks by using values of k ranging from 1-7. Functional modules were mined by using SCAN, a Structural Clustering Algorithm for Networks. Overall module qualities were evaluated for each integrated network using the following statistical and biological measures: (1) modularity, (2) similarity-based modularity, (3) clustering score, and (4) enrichment. Results Each integrated human PPI network was constructed based on the number of votes (k) for a particular interaction from the committee of the original seven PPI databases. The performance of functional modules obtained by SCAN from each integrated network was evaluated. The optimal value for k was determined by the functional module analysis. Our results demonstrate that the k-votes method outperforms the traditional union approach in terms of both statistical significance and biological meaning. The best network is achieved at k=2, which is composed of interactions that are confirmed in at least two PPI databases. In contrast, the traditional union approach yields an integrated network that consists of all interactions of seven PPI databases, which might be subject to high false positives. Conclusions We determined that the k-votes method for constructing a robust PPI network by integrating multiple public databases outperforms previously reported approaches and that a value of k=2 provides the best results. The developed strategies for combining databases show promise in the advancement of network construction and modeling.
- Published
- 2011
- Full Text
- View/download PDF
4. Evaluation of gene expression data generated from expired Affymetrix GeneChip® microarrays using MAQC reference RNA samples
- Author
-
Tong Weida, Hong Huixiao, Su Zhenqiang, Huang Ying, Shi Quan, Wang Charles, Wen Zhining, and Shi Leming
- Subjects
Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background The Affymetrix GeneChip® system is a commonly used platform for microarray analysis but the technology is inherently expensive. Unfortunately, changes in experimental planning and execution, such as the unavailability of previously anticipated samples or a shift in research focus, may render significant numbers of pre-purchased GeneChip® microarrays unprocessed before their manufacturer’s expiration dates. Researchers and microarray core facilities wonder whether expired microarrays are still useful for gene expression analysis. In addition, it was not clear whether the two human reference RNA samples established by the MAQC project in 2005 still maintained their transcriptome integrity over a period of four years. Experiments were conducted to answer these questions. Results Microarray data were generated in 2009 in three replicates for each of the two MAQC samples with either expired Affymetrix U133A or unexpired U133Plus2 microarrays. These results were compared with data obtained in 2005 on the U133Plus2 microarray. The percentage of overlap between the lists of differentially expressed genes (DEGs) from U133Plus2 microarray data generated in 2009 and in 2005 was 97.44%. While there was some degree of fold change compression in the expired U133A microarrays, the percentage of overlap between the lists of DEGs from the expired and unexpired microarrays was as high as 96.99%. Moreover, the microarray data generated using the expired U133A microarrays in 2009 were highly concordant with microarray and TaqMan® data generated by the MAQC project in 2005. Conclusions Our results demonstrated that microarray data generated using U133A microarrays, which were more than four years past the manufacturer’s expiration date, were highly specific and consistent with those from unexpired microarrays in identifying DEGs despite some appreciable fold change compression and decrease in sensitivity. Our data also suggested that the MAQC reference RNA samples, stored at -80°C, were stable over a time frame of at least four years.
- Published
- 2010
- Full Text
- View/download PDF
5. An FDA bioinformatics tool for microbial genomics research on molecular characterization of bacterial foodborne pathogens using microarrays
- Author
-
Turner Steve, Ye Yanbin, Su Zhenqiang, Chen James, Foley Steven, Nayak Rajesh, Zou Wen, Frye Jonathan G, Patel Isha R, Jackson Scott A, Ding Don, Xu Joshua, Fang Hong, Harris Steve, Zhou Guangxu, Cerniglia Carl, and Tong Weida
- Subjects
Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background Advances in microbial genomics and bioinformatics are offering greater insights into the emergence and spread of foodborne pathogens in outbreak scenarios. The Food and Drug Administration (FDA) has developed a genomics tool, ArrayTrackTM, which provides extensive functionalities to manage, analyze, and interpret genomic data for mammalian species. ArrayTrackTM has been widely adopted by the research community and used for pharmacogenomics data review in the FDA’s Voluntary Genomics Data Submission program. Results ArrayTrackTM has been extended to manage and analyze genomics data from bacterial pathogens of human, animal, and food origin. It was populated with bioinformatics data from public databases such as NCBI, Swiss-Prot, KEGG Pathway, and Gene Ontology to facilitate pathogen detection and characterization. ArrayTrackTM’s data processing and visualization tools were enhanced with analysis capabilities designed specifically for microbial genomics including flag-based hierarchical clustering analysis (HCA), flag concordance heat maps, and mixed scatter plots. These specific functionalities were evaluated on data generated from a custom Affymetrix array (FDA-ECSG) previously developed within the FDA. The FDA-ECSG array represents 32 complete genomes of Escherichia coli and Shigella. The new functions were also used to analyze microarray data focusing on antimicrobial resistance genes from Salmonella isolates in a poultry production environment using a universal antimicrobial resistance microarray developed by the United States Department of Agriculture (USDA). Conclusion The application of ArrayTrackTM to different microarray platforms demonstrates its utility in microbial genomics research, and thus will improve the capabilities of the FDA to rapidly identify foodborne bacteria and their genetic traits (e.g., antimicrobial resistance, virulence, etc.) during outbreak investigations. ArrayTrackTM is free to use and available to public, private, and academic researchers at http://www.fda.gov/ArrayTrack.
- Published
- 2010
- Full Text
- View/download PDF
6. The balance of reproducibility, sensitivity, and specificity of lists of differentially expressed genes in microarray studies
- Author
-
Su Zhenqiang, Shippy Richard, Puri Raj K, Peterson Ron L, Mei Nan, Ma Yunqing, Luo Yuling, Li Quan-Zhen, Kawasaki Ernest S, Hong Huixiao, Herman Damir, Han Jing, Guo Xu, Fuscoe James C, Frueh Felix W, Fan Xiao-hui, Collins Patrick J, Chu Tzu-Ming, Bertholet Vincent, Cao Xiaoxi, Bao Wenjun, Barbacioru Catalin C, Amur Shashi, Qian Feng, Fang Hong, Boysen Cecilie, Croner Lisa J, Guo Lei, Goodsaid Federico M, Perkins Roger G, Harris Stephen C, Jensen Roderick V, Jones Wendell D, Shi Leming, Sun Yongming, Sun Hongmei, Thorn Brett, Turpaz Yaron, Wang Charles, Wang Sue, Warrington Janet A, Willey James C, Wu Jie, Xie Qian, Zhang Liang, Zhang Lu, Zhong Sheng, Wolfinger Russell D, and Tong Weida
- Subjects
Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background Reproducibility is a fundamental requirement in scientific experiments. Some recent publications have claimed that microarrays are unreliable because lists of differentially expressed genes (DEGs) are not reproducible in similar experiments. Meanwhile, new statistical methods for identifying DEGs continue to appear in the scientific literature. The resultant variety of existing and emerging methods exacerbates confusion and continuing debate in the microarray community on the appropriate choice of methods for identifying reliable DEG lists. Results Using the data sets generated by the MicroArray Quality Control (MAQC) project, we investigated the impact on the reproducibility of DEG lists of a few widely used gene selection procedures. We present comprehensive results from inter-site comparisons using the same microarray platform, cross-platform comparisons using multiple microarray platforms, and comparisons between microarray results and those from TaqMan – the widely regarded "standard" gene expression platform. Our results demonstrate that (1) previously reported discordance between DEG lists could simply result from ranking and selecting DEGs solely by statistical significance (P) derived from widely used simple t-tests; (2) when fold change (FC) is used as the ranking criterion with a non-stringent P-value cutoff filtering, the DEG lists become much more reproducible, especially when fewer genes are selected as differentially expressed, as is the case in most microarray studies; and (3) the instability of short DEG lists solely based on P-value ranking is an expected mathematical consequence of the high variability of the t-values; the more stringent the P-value threshold, the less reproducible the DEG list is. These observations are also consistent with results from extensive simulation calculations. Conclusion We recommend the use of FC-ranking plus a non-stringent P cutoff as a straightforward and baseline practice in order to generate more reproducible DEG lists. Specifically, the P-value cutoff should not be stringent (too small) and FC should be as large as possible. Our results provide practical guidance to choose the appropriate FC and P-value cutoffs when selecting a given number of DEGs. The FC criterion enhances reproducibility, whereas the P criterion balances sensitivity and specificity.
- Published
- 2008
- Full Text
- View/download PDF
7. Assessing batch effects of genotype calling algorithm BRLMM for the Affymetrix GeneChip Human Mapping 500 K array set using 270 HapMap samples
- Author
-
Kaput Jim, Han Tao, Chen James J, Xu Joshua, Fang Hong, Perkins Roger, Shi Leming, Ge Weigong, Su Zhenqiang, Hong Huixiao, Fuscoe James C, and Tong Weida
- Subjects
Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background Genome-wide association studies (GWAS) aim to identify genetic variants (usually single nucleotide polymorphisms [SNPs]) across the entire human genome that are associated with phenotypic traits such as disease status and drug response. Highly accurate and reproducible genotype calling are paramount since errors introduced by calling algorithms can lead to inflation of false associations between genotype and phenotype. Most genotype calling algorithms currently used for GWAS are based on multiple arrays. Because hundreds of gigabytes (GB) of raw data are generated from a GWAS, the samples are typically partitioned into batches containing subsets of the entire dataset for genotype calling. High call rates and accuracies have been achieved. However, the effects of batch size (i.e., number of chips analyzed together) and of batch composition (i.e., the choice of chips in a batch) on call rate and accuracy as well as the propagation of the effects into significantly associated SNPs identified have not been investigated. In this paper, we analyzed both the batch size and batch composition for effects on the genotype calling algorithm BRLMM using raw data of 270 HapMap samples analyzed with the Affymetrix Human Mapping 500 K array set. Results Using data from 270 HapMap samples interrogated with the Affymetrix Human Mapping 500 K array set, three different batch sizes and three different batch compositions were used for genotyping using the BRLMM algorithm. Comparative analysis of the calling results and the corresponding lists of significant SNPs identified through association analysis revealed that both batch size and composition affected genotype calling results and significantly associated SNPs. Batch size and batch composition effects were more severe on samples and SNPs with lower call rates than ones with higher call rates, and on heterozygous genotype calls compared to homozygous genotype calls. Conclusion Batch size and composition affect the genotype calling results in GWAS using BRLMM. The larger the differences in batch sizes, the larger the effect. The more homogenous the samples in the batches, the more consistent the genotype calls. The inconsistency propagates to the lists of significantly associated SNPs identified in downstream association analysis. Thus, uniform and large batch sizes should be used to make genotype calls for GWAS. In addition, samples of high homogeneity should be placed into the same batch.
- Published
- 2008
- Full Text
- View/download PDF
8. Very Important Pool (VIP) genes – an application for microarray-based molecular signatures
- Author
-
Perkins Roger, Shi Leming, Fang Hong, Hong Huixiao, Su Zhenqiang, and Tong Weida
- Subjects
Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background Advances in DNA microarray technology portend that molecular signatures from which microarray will eventually be used in clinical environments and personalized medicine. Derivation of biomarkers is a large step beyond hypothesis generation and imposes considerably more stringency for accuracy in identifying informative gene subsets to differentiate phenotypes. The inherent nature of microarray data, with fewer samples and replicates compared to the large number of genes, requires identifying informative genes prior to classifier construction. However, improving the ability to identify differentiating genes remains a challenge in bioinformatics. Results A new hybrid gene selection approach was investigated and tested with nine publicly available microarray datasets. The new method identifies a Very Important Pool (VIP) of genes from the broad patterns of gene expression data. The method uses a bagging sampling principle, where the re-sampled arrays are used to identify the most informative genes. Frequency of selection is used in a repetitive process to identify the VIP genes. The putative informative genes are selected using two methods, t-statistic and discriminatory analysis. In the t-statistic, the informative genes are identified based on p-values. In the discriminatory analysis, disjoint Principal Component Analyses (PCAs) are conducted for each class of samples, and genes with high discrimination power (DP) are identified. The VIP gene selection approach was compared with the p-value ranking approach. The genes identified by the VIP method but not by the p-value ranking approach are also related to the disease investigated. More importantly, these genes are part of the pathways derived from the common genes shared by both the VIP and p-ranking methods. Moreover, the binary classifiers built from these genes are statistically equivalent to those built from the top 50 p-value ranked genes in distinguishing different types of samples. Conclusion The VIP gene selection approach could identify additional subsets of informative genes that would not always be selected by the p-value ranking method. These genes are likely to be additional true positives since they are a part of pathways identified by the p-value ranking method and expected to be related to the relevant biology. Therefore, these additional genes derived from the VIP method potentially provide valuable biological insights.
- Published
- 2008
- Full Text
- View/download PDF
9. Microarray scanner calibration curves: characteristics and implications
- Author
-
Xu Z Alex, Chen James J, Branham William S, Guo Lei, Goodsaid Federico M, Frueh Felix W, Fang Hong, Puri Raj K, Han Jing, Han Tao, Su Zhenqiang, Tong Weida, Shi Leming, Harris Stephen C, Hong Huixiao, Xie Qian, Perkins Roger G, and Fuscoe James C
- Subjects
Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background Microarray-based measurement of mRNA abundance assumes a linear relationship between the fluorescence intensity and the dye concentration. In reality, however, the calibration curve can be nonlinear. Results By scanning a microarray scanner calibration slide containing known concentrations of fluorescent dyes under 18 PMT gains, we were able to evaluate the differences in calibration characteristics of Cy5 and Cy3. First, the calibration curve for the same dye under the same PMT gain is nonlinear at both the high and low intensity ends. Second, the degree of nonlinearity of the calibration curve depends on the PMT gain. Third, the two PMTs (for Cy5 and Cy3) behave differently even under the same gain. Fourth, the background intensity for the Cy3 channel is higher than that for the Cy5 channel. The impact of such characteristics on the accuracy and reproducibility of measured mRNA abundance and the calculated ratios was demonstrated. Combined with simulation results, we provided explanations to the existence of ratio underestimation, intensity-dependence of ratio bias, and anti-correlation of ratios in dye-swap replicates. We further demonstrated that although Lowess normalization effectively eliminates the intensity-dependence of ratio bias, the systematic deviation from true ratios largely remained. A method of calculating ratios based on concentrations estimated from the calibration curves was proposed for correcting ratio bias. Conclusion It is preferable to scan microarray slides at fixed, optimal gain settings under which the linearity between concentration and intensity is maximized. Although normalization methods improve reproducibility of microarray measurements, they appear less effective in improving accuracy.
- Published
- 2005
- Full Text
- View/download PDF
10. atBioNet– an integrated network analysis tool for genomics and biomarker discovery
- Author
-
Ding Yijun, Chen Minjun, Liu Zhichao, Ding Don, Ye Yanbin, Zhang Min, Kelly Reagan, Guo Li, Su Zhenqiang, Harris Stephen C, Qian Feng, Ge Weigong, Fang Hong, Xu Xiaowei, and Tong Weida
- Subjects
Protein-protein interaction ,Network analysis ,Functional module ,Disease biomarker ,KEGG pathway analysis ,Visualization tool ,Genomics ,Biotechnology ,TP248.13-248.65 ,Genetics ,QH426-470 - Abstract
Abstract Background Large amounts of mammalian protein-protein interaction (PPI) data have been generated and are available for public use. From a systems biology perspective, Proteins/genes interactions encode the key mechanisms distinguishing disease and health, and such mechanisms can be uncovered through network analysis. An effective network analysis tool should integrate different content-specific PPI databases into a comprehensive network format with a user-friendly platform to identify key functional modules/pathways and the underlying mechanisms of disease and toxicity. Results atBioNet integrates seven publicly available PPI databases into a network-specific knowledge base. Knowledge expansion is achieved by expanding a user supplied proteins/genes list with interactions from its integrated PPI network. The statistically significant functional modules are determined by applying a fast network-clustering algorithm (SCAN: a Structural Clustering Algorithm for Networks). The functional modules can be visualized either separately or together in the context of the whole network. Integration of pathway information enables enrichment analysis and assessment of the biological function of modules. Three case studies are presented using publicly available disease gene signatures as a basis to discover new biomarkers for acute leukemia, systemic lupus erythematosus, and breast cancer. The results demonstrated that atBioNet can not only identify functional modules and pathways related to the studied diseases, but this information can also be used to hypothesize novel biomarkers for future analysis. Conclusion atBioNet is a free web-based network analysis tool that provides a systematic insight into proteins/genes interactions through examining significant functional modules. The identified functional modules are useful for determining underlying mechanisms of disease and biomarker discovery. It can be accessed at: http://www.fda.gov/ScienceResearch/BioinformaticsTools/ucm285284.htm.
- Published
- 2012
- Full Text
- View/download PDF
11. A randomized phase 2 study of sapanisertib in combination with paclitaxel versus paclitaxel alone in women with advanced, recurrent, or persistent endometrial cancer
- Author
-
Han, Sileny N., Oza, Amit, Colombo, Nicoletta, Oaknin, Ana, Raspagliesi, Francesco, Wenham, Robert M., Braicu, Elena Ioana, Jewell, Andrea, Makker, Vicky, Krell, Jonathan, Alía, Eva María Guerra, Baurain, Jean-François, Su, Zhenqiang, Neuwirth, Rachel, Vincent, Sylvie, Sedarati, Farhad, Faller, Douglas V., and Scambia, Giovanni
- Published
- 2023
- Full Text
- View/download PDF
12. Assessing reproducibility of inherited variants detected with short-read whole genome sequencing
- Author
-
Pan, Bohu, Ren, Luyao, Onuchic, Vitor, Guan, Meijian, Kusko, Rebecca, Bruinsma, Steve, Trigg, Len, Scherer, Andreas, Ning, Baitang, Zhang, Chaoyang, Glidewell-Kenney, Christine, Xiao, Chunlin, Donaldson, Eric, Sedlazeck, Fritz J., Schroth, Gary, Yavas, Gokhan, Grunenwald, Haiying, Chen, Haodong, Meinholz, Heather, Meehan, Joe, Wang, Jing, Yang, Jingcheng, Foox, Jonathan, Shang, Jun, Miclaus, Kelci, Dong, Lianhua, Shi, Leming, Mohiyuddin, Marghoob, Pirooznia, Mehdi, Gong, Ping, Golshani, Rooz, Wolfinger, Russ, Lababidi, Samir, Sahraeian, Sayed Mohammad Ebrahim, Sherry, Steve, Han, Tao, Chen, Tao, Shi, Tieliu, Hou, Wanwan, Ge, Weigong, Zou, Wen, Guo, Wenjing, Bao, Wenjun, Xiao, Wenzhong, Fan, Xiaohui, Gondo, Yoichi, Yu, Ying, Zhao, Yongmei, Su, Zhenqiang, Liu, Zhichao, Tong, Weida, Xiao, Wenming, Zook, Justin M., Zheng, Yuanting, and Hong, Huixiao
- Published
- 2022
- Full Text
- View/download PDF
13. Building hierarchical structures for 3D scenes with repeated elements
- Author
-
Zhao, Xi, Su, Zhenqiang, Komura, Taku, and Yang, Xinyu
- Published
- 2020
- Full Text
- View/download PDF
14. Evaluation of gene expression data generated from expired Affymetrix GeneChip(R) microarrays using MAQC reference RNA samples
- Author
-
Wen, Zhining, Wang, Charles, Shi, Quan, Huang, Ying, Su, Zhenqiang, Hong, Huixiao, Tong, Weida, and Shi, Leming
- Abstract
Abstract Background The Affymetrix GeneChip® system is a commonly used platform for microarray analysis but the technology is inherently expensive. Unfortunately, changes in experimental planning and execution, such as the unavailability of previously anticipated samples or a shift in research focus, may render significant numbers of pre-purchased GeneChip® microarrays unprocessed before their manufacturer’s expiration dates. Researchers and microarray core facilities wonder whether expired microarrays are still useful for gene expression analysis. In addition, it was not clear whether the two human reference RNA samples established by the MAQC project in 2005 still maintained their transcriptome integrity over a period of four years. Experiments were conducted to answer these questions. Results Microarray data were generated in 2009 in three replicates for each of the two MAQC samples with either expired Affymetrix U133A or unexpired U133Plus2 microarrays. These results were compared with data obtained in 2005 on the U133Plus2 microarray. The percentage of overlap between the lists of differentially expressed genes (DEGs) from U133Plus2 microarray data generated in 2009 and in 2005 was 97.44%. While there was some degree of fold change compression in the expired U133A microarrays, the percentage of overlap between the lists of DEGs from the expired and unexpired microarrays was as high as 96.99%. Moreover, the microarray data generated using the expired U133A microarrays in 2009 were highly concordant with microarray and TaqMan® data generated by the MAQC project in 2005. Conclusions Our results demonstrated that microarray data generated using U133A microarrays, which were more than four years past the manufacturer’s expiration date, were highly specific and consistent with those from unexpired microarrays in identifying DEGs despite some appreciable fold change compression and decrease in sensitivity. Our data also suggested that the MAQC reference RNA samples, stored at -80°C, were stable over a time frame of at least four years.
- Published
- 2010
15. Blood molecular markers associated with COVID‐19 immunopathology and multi‐organ damage
- Author
-
Chen, Yan‐Mei, Zheng, Yuanting, Yu, Ying, Wang, Yunzhi, Huang, Qingxia, Qian, Feng, Sun, Lei, Song, Zhi‐Gang, Chen, Ziyin, Feng, Jinwen, An, Yanpeng, Yang, Jingcheng, Su, Zhenqiang, Sun, Shanyue, Dai, Fahui, Chen, Qinsheng, Lu, Qinwei, Li, Pengcheng, Ling, Yun, Yang, Zhong, Tang, Huiru, Shi, Leming, Jin, Li, Holmes, Edward C, Ding, Chen, Zhu, Tong‐Yu, and Zhang, Yong‐Zhen
- Published
- 2020
- Full Text
- View/download PDF
16. Comparing genetic variants detected in the 1000 genomes project with SNPs determined by the International HapMap Consortium
- Author
-
ZHANG, WENQIAN, NG, HUI WEN, SHU, MAO, LUO, HENG, SU, ZHENQIANG, GE, WEIGONG, PERKINS, ROGER, TONG, WEIDA, and HONG, HUIXIAO
- Published
- 2015
- Full Text
- View/download PDF
17. Studies on abacavir-induced hypersensitivity reaction: a successful example of translation of pharmacogenetics to personalized medicine
- Author
-
Guo, YongLi, Shi, LeMing, Hong, HuiXiao, Su, ZhenQiang, Fuscoe, James, and Ning, BaiTang
- Published
- 2013
- Full Text
- View/download PDF
18. Critical role of bioinformatics in translating huge amounts of next-generation sequencing data into personalized medicine
- Author
-
Hong, HuiXiao, Zhang, WenQian, Shen, Jie, Su, ZhenQiang, Ning, BaiTang, Han, Tao, Perkins, Roger, Shi, LeMing, and Tong, WeiDa
- Published
- 2013
- Full Text
- View/download PDF
19. Evaluating variations of genotype calling: a potential source of spurious associations in genome-wide association studies
- Author
-
Hong, Huixiao, Su, Zhenqiang, Ge, Weigong, Shi, Leming, Perkins, Roger, Fang, Hong, Mendrick, Donna, and Tong, Weida
- Published
- 2010
- Full Text
- View/download PDF
20. Consensus analysis of multiple classifiers using non-repetitive variables: Diagnostic application to microarray gene expression data
- Author
-
Su, Zhenqiang, Hong, Huixiao, Perkins, Roger, Shao, Xueguang, Cai, Wensheng, and Tong, Weida
- Published
- 2007
- Full Text
- View/download PDF
21. EADB: An Estrogenic Activity Database for Assessing Potential Endocrine Activity
- Author
-
Shen, Jie, Xu, Lei, Fang, Hong, Richard, Ann M., Bray, Jeffrey D., Judson, Richard S., Zhou, Guangxu, Colatsky, Thomas J., Aungst, Jason L., Teng, Christina, Harris, Steve C., Ge, Weigong, Dai, Susie Y., Su, Zhenqiang, Jacobs, Abigail C., Harrouk, Wafa, Perkins, Roger, Tong, Weida, and Hong, Huixiao
- Published
- 2013
- Full Text
- View/download PDF
22. The Liver Toxicity Biomarker Study Phase I: Markers for the Effects of Tolcapone or Entacapone
- Author
-
MCBURNEY, ROBERT N., HINES, WADE M., VONTUNGELN, LINDA S., SCHNACKENBERG, LAURA K., BEGER, RICHARD D., MOLAND, CARRIE L., HAN, TAO, FUSCOE, JAMES C., CHANG, CHING-WEI, CHEN, JAMES J., SU, ZHENQIANG, FAN, XIAO-HUI, TONG, WEIDA, BOOTH, SHELAGH A., BALASUBRAMANIAN, RAJI, COURCHESNE, PAUL L., CAMPBELL, JENNIFER M., GRABER, ARMIN, GUO, YU, JUHASZ, PETER, LI, TRICIA Y., LYNCH, MOIRA D., MOREL, NICOLE M., PLASTERER, THOMAS N., TAKACH, EDWARD J., ZENG, CHENHUI, and BELAND, FREDERICK A.
- Published
- 2012
- Full Text
- View/download PDF
23. Identification of Urinary microRNA Profiles in Rats That May Diagnose Hepatotoxicity
- Author
-
Yang, Xi, Greenhaw, James, Shi, Qiang, Su, Zhenqiang, Qian, Feng, Davis, Kelly, Mendrick, Donna L., and Salminen, William F.
- Published
- 2012
- Full Text
- View/download PDF
24. The Liver Toxicity Biomarker Study: Phase I Design and Preliminary Results
- Author
-
Mcburney, Robert N., Hines, Wade M, Von Tungeln, Linda S., Schnackenberg, Laura K., Beger, Richard D., Moland, Carrie L., Han, Tao, Fuscoe, James C., Chang, Ching-Wei, Chen, James J., Su, Zhenqiang, Fan, Xiao-Hui, Tong, Weida, Booth, Shelagh A., Balasubramanian, Raji, Courchesne, Paul L., Campbell, Jennifer M., Graber, Armin, Guo, Yu, Juhasz, Peter J., Li, Tricin Y., Lynch, Moira D., Morel, Nicole M., Plasterer, Thomas N., Takach, Edward J., Zeng, Chenhui, and Beland, Frederick A.
- Published
- 2009
- Full Text
- View/download PDF
25. Image-tag-based indoor localization using end-to-end learning.
- Author
-
Alarfaj, Mohammed, Su, Zhenqiang, Liu, Raymond, Al-Humam, Abdulaziz, and Liu, Huaping
- Subjects
- *
DEEP learning , *IMAGE registration - Abstract
Image or feature matching-based indoor localization still faces many technical challenges. Image-tag-based schemes using pose estimation are accurate and robust, but they still cannot be deployed widely because their performance degrades significantly when the tag-camera distance is large, which requires densely distributed tags, and the designed system generally is specific to some special tags and lenses. Also, the lens distortion degrades the performance appreciably and is difficult to correct, especially for the wide-angle lenses. This article develops an image-tag-based indoor localization system using end-to-end learning to overcome these issues. It is a deep learning–based system that can learn the mapping from the original tag image to the final 2D location directly from training examples through self-learned features. It achieves consistent performance even when the tag-camera distance is large or when the image has a low resolution. The mapping learned by the deep learning model factors in all kinds of distortions without requiring any distortion estimation. The tag design is based on shape features to make it robust to lighting changes. The system can be easily adapted to new lenses/cameras and/or new tags. Thus, it facilitates easy and rapid deployment without requiring knowledge from domain experts. A drawback of the general deep learning model is its high computational requirements. We discuss practical solutions to enable real-time applications of the proposed scheme even when it is running on a mobile or embedded device. The performance of the proposed scheme is evaluated via a set of experiments in a real setting and has achieved less than 20 cm of positioning errors. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
26. Erratum to: Studies on abacavir-induced hypersensitivity reaction: a successful example of translation of pharmacogenetics to personalized medicine
- Author
-
Guo, YongLi, Shi, LeMing, Hong, HuiXiao, Su, ZhenQiang, Fuscoe, James, and Ning, BaiTang
- Published
- 2013
- Full Text
- View/download PDF
27. Erratum to: Critical role of bioinformatics in translating huge amounts of next-generation sequencing data into personalized medicine
- Author
-
Hong, HuiXiao, Zhang, WenQian, Shen, Jie, Su, ZhenQiang, Ning, BaiTang, Han, Tao, Perkins, Roger, Shi, LeMing, and Tong, WeiDa
- Published
- 2013
- Full Text
- View/download PDF
28. The Reproducibility of Lists of Differentially Expressed Genes in Microarray Studies
- Author
-
Shi, Leming, Jones, Wendell, Jensen, Roderick, Harris, Stephen, Perkins, Roger, Goodsaid, Federico, Guo, Lei, Croner, Lisa, Boysen, Cecilie, Fang, Hong, Amur, Shashi, Bao, Wenjun, Barbacioru, Catalin, Bertholet, Vincent, Cao, Xiaoxi Megan, Chu, Tzu-Ming, Collins, Patrick, Fan, Xiao-hui, Frueh, Felix, Fuscoe, James, Guo, Xu, Han, Jing, Herman, Damir, Hong, Huixiao, Kawasaki, Ernest, Li, Quan-Zhen, Luo, Yuling, Ma, Yunqing, Mei, Nan, Peterson, Ron, Puri, Raj, Qian, Feng, Shippy, Richard, Su, Zhenqiang, Sun, Yongming Andrew, Sun, Hongmei, Thorn, Brett, Turpaz, Yaron, Wang, Charles, Wang, Sue-Jane, Warrington, Janet, Willey, James, Wu, Jie, Xie, Qian, Zhang, Liang, Zhang, Lu, Zhong, Sheng, Chen, James, Wolfinger, Russell, and Tong, Weida
- Published
- 2007
- Full Text
- View/download PDF
29. Building hierarchical structures for 3D scenes based on normalized cut.
- Author
-
Zhao, Xi, Su, Zhenqiang, and Yang, Xinyu
- Subjects
POINT cloud ,SURFACE interactions ,TEST methods - Abstract
The growing number of 3D scene data available online brings in new challenges for scene retrieval, understanding, and synthesis. Traditional shape processing methods have difficulty to manage 3D scenes because such methods ignore the contextual information, that is, the spatial relationship between the objects or groups of objects, which plays a significant role in describing scenes. Therefore, a context‐aware representation is needed to deal with such a problem. In this paper, we propose a method to build scene hierarchies based on contextual information. Given a 3D scene, we first use the interaction bisector surface to measure the affinity between different objects/elements of the scene and then apply the normalized cut method to build a hierarchical structure for the whole scene. The resulting hierarchical structure contains not only the relationship between the individual objects but also the relationship between object groups, which provides much richer information of the scene compared with a flat structure that only describes the contacts or affinity between the individual objects. We test our method using several public databases and show that the resulting structure is more consistent with the ground truth. We also show that our method can be used for point cloud segmentation and outperforms previous methods. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
30. Comparing Bortezomib-Lenalidomide-Dexamethasone (VRd) with Carfilzomib-Lenalidomide-Dexamethasone (KRd) in the Patients with Newly Diagnosed Multiple Myeloma (NDMM) in Two Observational Studies
- Author
-
Li, Bin, Ren, Kaili, Shen, Lei, Hou, Peijie, Su, Zhenqiang, Di Bacco, Alessandra, Hong, Jin-Liern, Galaznik, Aaron, Dash, Ajeeta B, Crossland, Victoria, Dolin, Paul, and Szalma, Sandor
- Published
- 2018
- Full Text
- View/download PDF
31. Prognostic relevance and performance characteristics of serum IGFBP‐2 and PAPP‐A in women with breast cancer: a long‐term Danish cohort study.
- Author
-
Espelund, Ulrick, Renehan, Andrew G., Cold, Søren, Oxvig, Claus, Lancashire, Lee, Su, Zhenqiang, Flyvbjerg, Allan, and Frystyk, Jan
- Subjects
BLOOD serum analysis ,PROGNOSTIC tests ,BREAST cancer diagnosis ,GROWTH factors ,INSULIN-like growth factor receptors ,BIOLOGICAL tags - Abstract
Abstract: Measurement of circulating insulin‐like growth factors (IGFs), in particular IGF‐binding protein (IGFBP)‐2, at the time of diagnosis, is independently prognostic in many cancers, but its clinical performance against other routinely determined prognosticators has not been examined. We measured IGF‐I, IGF‐II, pro‐IGF‐II, IGF bioactivity, IGFBP‐2, ‐3, and pregnancy‐associated plasma protein A (PAPP‐A), an IGFBP regulator, in baseline samples of 301 women with breast cancer treated on four protocols (Odense, Denmark: 1993–1998). We evaluated performance characteristics (expressed as area under the curve, AUC) using Cox regression models to derive hazard ratios (HR) with 95% confidence intervals (CIs) for 10‐year recurrence‐free survival (RFS) and overall survival (OS), and compared those against the clinically used Nottingham Prognostic Index (NPI). We measured the same biomarkers in 531 noncancer individuals to assess multidimensional relationships (MDR), and evaluated additional prognostic models using survival artificial neural network (SANN) and survival support vector machines (SSVM), as these enhance capture of MDRs. For RFS, increasing concentrations of circulating IGFBP‐2 and PAPP‐A were independently prognostic [HR
biomarker doubling : 1.474 (95% CIs: 1.160, 1.875, P = 0.002) and 1.952 (95% CIs: 1.364, 2.792, P < 0.001), respectively]. The AUCRFS for NPI was 0.626 (Cox model), improving to 0.694 (P = 0.012) with the addition of IGFBP‐2 plus PAPP‐A. Derived AUCRFS using SANN and SSVM did not perform superiorly. Similar patterns were observed for OS. These findings illustrate an important principle in biomarker qualification—measured circulating biomarkers may demonstrate independent prognostication, but this does not necessarily translate into substantial improvement in clinical performance. [ABSTRACT FROM AUTHOR]- Published
- 2018
- Full Text
- View/download PDF
32. The concordance between RNA-seq and microarray data depends on chemical treatment and transcript abundance.
- Author
-
Wang, Charles, Gong, Binsheng, Bushel, Pierre R, Thierry-Mieg, Jean, Thierry-Mieg, Danielle, Xu, Joshua, Fang, Hong, Hong, Huixiao, Shen, Jie, Su, Zhenqiang, Meehan, Joe, Li, Xiaojin, Yang, Lu, Li, Haiqing, Łabaj, Paweł P, Kreil, David P, Megherbi, Dalila, Gaj, Stan, Caiment, Florian, and van Delft, Joost
- Subjects
RNA sequencing ,MICROARRAY technology ,GENE expression ,BIOCOMPLEXITY ,DECISION making ,LABORATORY rats - Abstract
The concordance of RNA-sequencing (RNA-seq) with microarrays for genome-wide analysis of differential gene expression has not been rigorously assessed using a range of chemical treatment conditions. Here we use a comprehensive study design to generate Illumina RNA-seq and Affymetrix microarray data from the same liver samples of rats exposed in triplicate to varying degrees of perturbation by 27 chemicals representing multiple modes of action (MOAs). The cross-platform concordance in terms of differentially expressed genes (DEGs) or enriched pathways is linearly correlated with treatment effect size (R
2 0.8). Furthermore, the concordance is also affected by transcript abundance and biological complexity of the MOA. RNA-seq outperforms microarray (93% versus 75%) in DEG verification as assessed by quantitative PCR, with the gain mainly due to its improved accuracy for low-abundance transcripts. Nonetheless, classifiers to predict MOAs perform similarly when developed using data from either platform. Therefore, the endpoint studied and its biological complexity, transcript abundance and the genomic application are important factors in transcriptomic research and for clinical and regulatory decision making. [ABSTRACT FROM AUTHOR]- Published
- 2014
- Full Text
- View/download PDF
33. Toxicogenomics and Cancer Susceptibility: Advances with Next-Generation Sequencing.
- Author
-
Ning, Baitang, Su, Zhenqiang, Mei, Nan, Hong, Huixiao, Deng, Helen, Shi, Leming, Fuscoe, James C., and Tolleson, William H.
- Subjects
- *
TOXICOGENOMICS , *DISEASE susceptibility , *GENETIC disorders , *NUCLEOTIDE sequence , *DNA damage , *DNA repair , *CARCINOGENESIS - Abstract
The aim of this review is to comprehensively summarize the recent achievements in the field of toxicogenomics and cancer research regarding genetic-environmental interactions in carcinogenesis and detection of genetic aberrations in cancer genomes by next-generation sequencing technology. Cancer is primarily a genetic disease in which genetic factors and environmental stimuli interact to cause genetic and epigenetic aberrations in human cells. Mutations in the germline act as either high-penetrance alleles that strongly increase the risk of cancer development, or as low-penetrance alleles that mildly change an individual's susceptibility to cancer. Somatic mutations, resulting from either DNA damage induced by exposure to environmental mutagens or from spontaneous errors in DNA replication or repair are involved in the development or progression of the cancer. Induced or spontaneous changes in the epigenome may also drive carcinogenesis. Advances in next-generation sequencing technology provide us opportunities to accurately, economically, and rapidly identify genetic variants, somatic mutations, gene expression profiles, and epigenetic alterations with single-base resolution. Whole genome sequencing, whole exome sequencing, and RNA sequencing of paired cancer and adjacent normal tissue present a comprehensive picture of the cancer genome. These new findings should benefit public health by providing insights in understanding cancer biology, and in improving cancer diagnosis and therapy. [ABSTRACT FROM PUBLISHER]
- Published
- 2014
- Full Text
- View/download PDF
34. The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models.
- Author
-
Shi, Leming, Campbell, Gregory, Jones, Wendell D, Campagne, Fabien, Wen, Zhining, Walker, Stephen J, Su, Zhenqiang, Chu, Tzu-Ming, Goodsaid, Federico M, Pusztai, Lajos, Shaughnessy, John D, Oberthuer, André, Thomas, Russell S, Paules, Richard S, Fielden, Mark, Barlogie, Bart, Chen, Weijie, Du, Pan, Fischer, Matthias, and Furlanello, Cesare
- Subjects
GENE expression ,DNA microarrays ,LIVER diseases ,TOXICOLOGY ,BREAST cancer ,MULTIPLE myeloma ,NEUROBLASTOMA - Abstract
Gene expression data from microarrays are being applied to predict preclinical and clinical endpoints, but the reliability of these predictions has not been established. In the MAQC-II project, 36 independent teams analyzed six microarray data sets to generate predictive models for classifying a sample with respect to one of 13 endpoints indicative of lung or liver toxicity in rodents, or of breast cancer, multiple myeloma or neuroblastoma in humans. In total, >30,000 models were built using many combinations of analytical methods. The teams generated predictive models without knowing the biological meaning of some of the endpoints and, to mimic clinical reality, tested the models on data that had not been used for training. We found that model performance depended largely on the endpoint and team proficiency and that different approaches generated models of similar performance. The conclusions and recommendations from MAQC-II should be useful for regulatory agencies, study committees and independent investigators that evaluate methods for global gene expression analysis. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
35. An FDA bioinformatics tool for microbial genomics research on molecular characterization of bacterial foodborne pathogens using microarrays.
- Author
-
Fang, Hong, Xu, Joshua, Ding, Don, Jackson, Scott A., Patel, Isha R., Frye, Jonathan G., Zou, Wen, Nayak, Rajesh, Foley, Steven, Chen, James, Su, Zhenqiang, Ye, Yanbin, Turner, Steve, Harris, Steve, Zhou, Guangxu, Cemiglia, Carl, and Tong, Weida
- Subjects
BIOINFORMATICS ,GENOMICS ,FOOD pathogens - Abstract
Background: Advances in microbial genomics and bioinformatics are offering greater insights into the emergence and spread of foodborne pathogens in outbreak scenarios. The Food and Drug Administration (FDA) has developed a genomics tool, ArrayTrack™, which provides extensive functionalities to manage, analyze, and interpret genomic data for mammalian species. ArrayTrack™ has been widely adopted by the research community and used for pharmacogenomics data review in the FDA's Voluntary Genomics Data Submission program. Results: ArrayTrack™ has been extended to manage and analyze genomics data from bacterial pathogens of human, animal, and food origin. It was populated with bioinformatics data from public databases such as NCBI, Swiss-Prot, KEGG Pathway, and Gene Ontology to facilitate pathogen detection and characterization. ArrayTrack™'s data processing and visualization tools were enhanced with analysis capabilities designed specifically for microbial genomics including flag-based hierarchical clustering analysis (HCA), flag concordance heat maps, and mixed scatter plots. These specific functionalities were evaluated on data generated from a custom Affymetrix array (FDAECSG) previously developed within the FDA. The FDA-ECSG array represents 32 complete genomes of Escherichia coli and Shigella. The new functions were also used to analyze microarray data focusing on antimicrobial resistance genes from Salmonella isolates in a poultry production environment using a universal antimicrobial resistance microarray developed by the United States Department of Agriculture (USDA). Conclusion: The application of ArrayTrack™ to different microarray platforms demonstrates its utility in microbial genomics research, and thus will improve the capabilities of the FDA to rapidly identify foodborne bacteria and their genetic traits (e.g., antimicrobial resistance, virulence, etc.) during outbreak investigations. ArrayTrack™ is free to use and available to public, private, and academic researchers at http://www.fda.gov/ArrayTrack. [ABSTRACT FROM AUTHOR]
- Published
- 2010
36. Microarray platform consistency is revealed by biologically functional analysis of gene expression profiles.
- Author
-
Li, Zhiguang, Su, Zhenqiang, Wen, Zhining, Shi, Leming, and Chen, Tao
- Subjects
- *
DNA microarrays , *FUNCTIONAL analysis , *GENE expression , *CARCINOGENICITY , *LABORATORY rats - Abstract
Background: Several different microarray platforms are available for measuring gene expression. There are disagreements within the microarray scientific community for intra- and inter-platform consistency of these platforms. Both high and low consistencies were demonstrated across different platforms in terms of genes with significantly differential expression. Array studies for gene expression are used to explore biological causes and effects. Therefore, consistency should eventually be evaluated in a biological setting to reveal the functional differences between the examined samples, not just a list of differentially expressed genes (DEG). In this study, we investigated whether different platforms had a high consistency from the biologically functional perspective. Results: DEG data without filtering the different probes in microarrays from different platforms generated from kidney samples of rats treated with the kidney carcinogen, aristolochic acid, in five test sites using microarrays from Affymetrix, Applied Biosystems, Agilent, and GE health platforms (two sites using Affymetrix for intra-platform comparison) were input into the Ingenuity Pathway Analysis (IPA) system for functional analysis. The functions of the DEG lists determined by IPA were compared across the four different platforms and two test sites for Affymetrix platform. Analysis results showed that there is a very high level of consistency between the two test sites using the same platform or among different platforms. The top functions determined by the different platforms were very similar and reflected carcinogenicity and toxicity of aristolochic acid in the rat kidney. Conclusion: Our results demonstrate that highly consistent biological information can be generated from different microarray platforms. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
37. A Partial Least Squares‐Based Consensus Regression Method for the Analysis of Near‐Infrared Complex Spectral Data of Plant Samples.
- Author
-
Su, Zhenqiang, Tong, Weida, Shi, Leming, Shao, Xueguang, and Cai, Wensheng
- Subjects
- *
REGRESSION analysis , *CALIBRATION , *CORN , *MOISTURE , *PROTEINS , *STARCH - Abstract
A consensus regression approach based on partial least square (PLS) regression, named as cPLS, for calibrating the NIR data was investigated. In this approach, multiple independent PLS models were developed and integrated into a single consensus model. The utility and merits of the cPLS method were demonstrated by comparing its results with those from a regular PLS method in predicting moisture, oil, protein, and starch contents of corn samples using the NIR spectral data. It was found that cPLS was superior to regular PLS with respect to prediction accuracy and robustness. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
38. Cross-platform comparability of microarray technology: Intra-platform consistency and appropriate data analysis procedures are essential.
- Author
-
Shi, Leming, Tong, Weida, Fang, Hong, Scherf, Uwe, Han, Jing, Puri, Raj K., Frueh, Felix W., Goodsaid, Federico M., Guo, Lei, Su, Zhenqiang, Han, Tao, Fuscoe, James C., Xu, Z. Alex, Patterson, Tucker A., Hong, Huixiao, Xie, Qian, Perkins, Roger G., Chen, James J., and Casciano, Daniel A.
- Subjects
DNA microarrays ,DATA analysis ,RNA ,NUCLEIC acids ,GENES - Abstract
Background: The acceptance of microarray technology in regulatory decision-making is being challenged by the existence of various platforms and data analysis methods. A recent report (E. Marshall, Science, 306, 630-631, 2004), by extensively citing the study of and et al. (Nucleic Acids Res., 31, 5676-5684, 2003), portrays a disturbingly negative picture of the cross-platform comparability, and, hence, the reliability of microarray technology. Results: We reanalyzed Tan's dataset and found that the intra-platform consistency was low, indicating a problem in experimental procedures from which the dataset was generated. Furthermore, by using three gene selection methods (i.e., p-value ranking, fold-change ranking, and Significance Analysis of Microarrays (SAM)) on the same dataset we found that p-value ranking (the method emphasized by Tan et al.) results in much lower cross-platform concordance compared to fold-change ranking or SAM. Therefore, the low cross-platform concordance reported in Tan's study appears to be mainly due to a combination of low intra-platform consistency and a poor choice of data analysis procedures, instead of inherent technical differences among different platforms, as suggested by Tan et al. and Marshall. Conclusion: Our results illustrate the importanceof establishing calibrated RNA samples and reference datasets to objectively assess the performance of different microarray platforms and the proficiency of individual laboratories as well as the merits of various data analysis procedures. Thus, we are progressively coordinating the MAQC project, a community-wide effort for microarray quality control. [ABSTRACT FROM AUTHOR]
- Published
- 2005
- Full Text
- View/download PDF
39. Microarray scanner calibration curves: characteristics and implications.
- Author
-
Shi, Leming, Tong, Weida, Su, Zhenqiang, Han, Tao, Han, Jing, Puri, Raj K., Fang, Hong, Frueh, Felix W., Goodsaid, Federico M., Guo, Lei, Branham, William S., Chen, James J., Xu, Z. Alex, Harris, Stephen C., Hong, Huixiao, Xie, Qian, Perkins, Roger G., and Fuscoe, James C.
- Subjects
MESSENGER RNA ,CALIBRATION ,STANDARDIZATION ,PROTEIN microarrays ,SCANNING systems - Abstract
Background: Microarray-based measurement of mRNA abundance assumes a linear relationship between the fluorescence intensity and the dye concentration. In reality, however, the calibration curve can be nonlinear. Results: By scanning a microarray scanner calibration slide containing known concentrations of fluorescent dyes under 18 PMT gains, we were able to evaluate the differences in calibration characteristics of Cy5 and Cy3. First, the calibration curve for the same dye under the same PMT gain is onlinear at both the high and low intensity ends. Second, the degree of nonlinearity of the calibration curve depends on the PMT gain. Third, the two PMTs (for Cy5 and Cy3) behave differently even under the same gain. Fourth, the background intensity for the Cy3 channel is higher than that for the Cy5 channel. The impact of such characteristics on the accuracy and reproducibility of measured mRNA abundance and the calculated ratios was demonstrated. Combined with simulation results, we provided explanations to the existence of ratio underestimation, intensity-dependence of ratio bias, and anti-correlation of ratios in dye-swap replicates. We further demonstrated that although Lowess normalization effectively eliminates the intensity-dependence of ratio bias, the systematic deviation from true ratios largely remained. A method of calculating ratios based on concentrations estimated from the calibration curves was proposed for correcting ratio bias. Conclusion: It is preferable to scan microarray slides at fixed, optimal gain settings under which the linearity between concentration and intensity is maximized. Although normalization methods improve reproducibility of microarray measurements, they appear less effective in improving accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2005
- Full Text
- View/download PDF
40. Comparison of RNA-seq and microarray-based models for clinical endpoint prediction
- Author
-
Zhang, Wenqian, Yu, Ying, Hertwig, Falk, Thierry-Mieg, Jean, Zhang, Wenwei, Thierry-Mieg, Danielle, Wang, Jian, Furlanello, Cesare, Devanarayan, Viswanath, Cheng, Jie, Deng, Youping, Hero, Barbara, Hong, Huixiao, Jia, Meiwen, Li, Li, Lin, Simon M, Nikolsky, Yuri, Oberthuer, André, Qing, Tao, Su, Zhenqiang, Volland, Ruth, Wang, Charles, Wang, May D., Ai, Junmei, Albanese, Davide, Asgharzadeh, Shahab, Avigad, Smadar, Bao, Wenjun, Bessarabova, Marina, Brilliant, Murray H., Brors, Benedikt, Chierici, Marco, Chu, Tzu-Ming, Zhang, Jibin, Grundy, Richard G., He, Min Max, Hebbring, Scott, Kaufman, Howard L., Lababidi, Samir, Lancashire, Lee J., Li, Yan, Lu, Xin X., Luo, Heng, Ma, Xiwen, Ning, Baitang, Noguera, Rosa, Peifer, Martin, Phan, John H., Roels, Frederik, Rosswog, Carolina, Shao, Susan, Shen, Jie, Theissen, Jessica, Tonini, Gian Paolo, Vandesompele, Jo, Wu, Po-Yen, Xiao, Wenzhong, Xu, Joshua, Xu, Weihong, Xuan, Jiekun, Yang, Yong, Ye, Zhan, Dong, Zirui, Zhang, Ke K., Yin, Ye, Zhao, Chen, Zheng, Yuanting, Wolfinger, Russell D., Shi, Tieliu, Malkas, Linda H., Berthold, Frank, Wang, Jun, Tong, Weida, Shi, Leming, Peng, Zhiyu, and Fischer, Matthias
- Abstract
Background: Gene expression profiling is being widely applied in cancer research to identify biomarkers for clinical endpoint prediction. Since RNA-seq provides a powerful tool for transcriptome-based applications beyond the limitations of microarrays, we sought to systematically evaluate the performance of RNA-seq-based and microarray-based classifiers in this MAQC-III/SEQC study for clinical endpoint prediction using neuroblastoma as a model. Results: We generate gene expression profiles from 498 primary neuroblastomas using both RNA-seq and 44 k microarrays. Characterization of the neuroblastoma transcriptome by RNA-seq reveals that more than 48,000 genes and 200,000 transcripts are being expressed in this malignancy. We also find that RNA-seq provides much more detailed information on specific transcript expression patterns in clinico-genetic neuroblastoma subgroups than microarrays. To systematically compare the power of RNA-seq and microarray-based models in predicting clinical endpoints, we divide the cohort randomly into training and validation sets and develop 360 predictive models on six clinical endpoints of varying predictability. Evaluation of factors potentially affecting model performances reveals that prediction accuracies are most strongly influenced by the nature of the clinical endpoint, whereas technological platforms (RNA-seq vs. microarrays), RNA-seq data analysis pipelines, and feature levels (gene vs. transcript vs. exon-junction level) do not significantly affect performances of the models. Conclusions: We demonstrate that RNA-seq outperforms microarrays in determining the transcriptomic characteristics of cancer, while RNA-seq and microarray-based models perform similarly in clinical endpoint prediction. Our findings may be valuable to guide future studies on the development of gene expression-based predictive models and their implementation in clinical practice. Electronic supplementary material The online version of this article (doi:10.1186/s13059-015-0694-1) contains supplementary material, which is available to authorized users.
- Published
- 2015
- Full Text
- View/download PDF
41. A rat RNA-Seq transcriptomic BodyMap across 11 organs and 4 developmental stages.
- Author
-
Yu, Ying, Fuscoe, James C., Zhao, Chen, Guo, Chao, Jia, Meiwen, Qing, Tao, Bannon, Desmond I., Lancashire, Lee, Bao, Wenjun, Du, Tingting, Luo, Heng, Su, Zhenqiang, Jones, Wendell D., Moland, Carrie L., Branham, William S., Qian, Feng, Ning, Baitang, Li, Yan, Hong, Huixiao, and Guo, Lei
- Published
- 2014
- Full Text
- View/download PDF
42. ChemInform Abstract: Mold2, Molecular Descriptors from 2D Structures for Chemoinformatics and Toxicoinformatics.
- Author
-
Hong, Huixiao, Xie, Qian, Ge, Weigong, Qian, Feng, Fang, Hong, Shi, Leming, Su, Zhenqiang, Perkins, Roger, and Tong, Weida
- Published
- 2008
- Full Text
- View/download PDF
43. Combined miRNA transcriptome and proteome analysis of extracellular vesicles in urine and blood from the Pompe mouse model.
- Author
-
Merberg D, Moreland R, Su Z, Li B, Crooker B, Palmieri K, Moore SW, Melber A, Boyanapalli R, Carey G, and Makhija M
- Subjects
- Animals, Mice, Extracellular Vesicles metabolism, Exosomes metabolism, Exosomes genetics, Biomarkers urine, Biomarkers blood, Male, alpha-Glucosidases genetics, alpha-Glucosidases urine, alpha-Glucosidases blood, alpha-Glucosidases metabolism, Genetic Therapy methods, MicroRNAs blood, MicroRNAs urine, Glycogen Storage Disease Type II genetics, Glycogen Storage Disease Type II blood, Glycogen Storage Disease Type II urine, Disease Models, Animal, Proteome metabolism, Transcriptome
- Abstract
Introduction: Acid α-glucosidase (GAA) is a lysosomal enzyme that hydrolyzes glycogen to glucose. Deficiency of GAA causes Pompe disease (PD), also known as glycogen storage disease type II. The resulting glycogen accumulation causes a spectrum of disease severity ranging from infantile-onset PD to adult-onset PD. Additional non-invasive biomarkers of disease severity are needed to monitor response to therapeutic interventions., Methods: We measured protein and miRNA abundance in exosomes from serum and urine from the PD mouse model (B6;129-GaaTm1Rabn/J), wild-type mice, and PD mice treated with a candidate gene therapy., Results: There were significant differences in the abundance of 113 miRNA in serum exosomes from Pompe versus healthy mice. Levels of miR-206, miR-133, miR-1a, miR-486, and other important regulators of muscle development and maintenance were altered in the Pompe samples. The serum and urine exosome proteomes of healthy and Pompe mice also differed broadly. Several of the dysregulated proteins are encoded by genes with potential target sites for affected miRNA., Conclusion: Exosomes derived from urine or serum are a potential source of biomarkers for Pompe Disease. Further study of the differences in the miRNA transcriptome and proteome content of exosomes may yield new insights into disease mechanisms.
- Published
- 2024
- Full Text
- View/download PDF
44. Integrated microRNA, mRNA, and protein expression profiling reveals microRNA regulatory networks in rat kidney treated with a carcinogenic dose of aristolochic acid.
- Author
-
Li Z, Qin T, Wang K, Hackenberg M, Yan J, Gao Y, Yu LR, Shi L, Su Z, and Chen T
- Subjects
- Animals, Gene Expression Regulation, Neoplastic drug effects, Gene Regulatory Networks drug effects, High-Throughput Nucleotide Sequencing, Kidney metabolism, Kidney Neoplasms etiology, Male, MicroRNAs analysis, Molecular Sequence Data, Proteomics, RNA, Messenger analysis, Rats, Sequence Analysis, RNA, Aristolochic Acids toxicity, Carcinogens toxicity, Kidney drug effects, Kidney Neoplasms genetics, Kidney Neoplasms metabolism, RNA, Neoplasm analysis
- Abstract
Background: Aristolochic Acid (AA), a natural component of Aristolochia plants that is found in a variety of herbal remedies and health supplements, is classified as a Group 1 carcinogen by the International Agency for Research on Cancer. Given that microRNAs (miRNAs) are involved in cancer initiation and progression and their role remains unknown in AA-induced carcinogenesis, we examined genome-wide AA-induced dysregulation of miRNAs as well as the regulation of miRNAs on their target gene expression in rat kidney., Results: We treated rats with 10 mg/kg AA and vehicle control for 12 weeks and eight kidney samples (4 for the treatment and 4 for the control) were used for examining miRNA and mRNA expression by deep sequencing, and protein expression by proteomics. AA treatment resulted in significant differential expression of miRNAs, mRNAs and proteins as measured by both principal component analysis (PCA) and hierarchical clustering analysis (HCA). Specially, 63 miRNAs (adjusted p value < 0.05 and fold change > 1.5), 6,794 mRNAs (adjusted p value < 0.05 and fold change > 2.0), and 800 proteins (fold change > 2.0) were significantly altered by AA treatment. The expression of 6 selected miRNAs was validated by quantitative real-time PCR analysis. Ingenuity Pathways Analysis (IPA) showed that cancer is the top network and disease associated with those dysregulated miRNAs. To further investigate the influence of miRNAs on kidney mRNA and protein expression, we combined proteomic and transcriptomic data in conjunction with miRNA target selection as confirmed and reported in miRTarBase. In addition to translational repression and transcriptional destabilization, we also found that miRNAs and their target genes were expressed in the same direction at levels of transcription (169) or translation (227). Furthermore, we identified that up-regulation of 13 oncogenic miRNAs was associated with translational activation of 45 out of 54 cancer-related targets., Conclusions: Our findings suggest that dysregulated miRNA expression plays an important role in AA-induced carcinogenesis in rat kidney, and that the integrated approach of multiple profiling provides a new insight into a post-transcriptional regulation of miRNAs on their target repression and activation in a genome-wide scale.
- Published
- 2015
- Full Text
- View/download PDF
45. An investigation of biomarkers derived from legacy microarray data for their utility in the RNA-seq era.
- Author
-
Su Z, Fang H, Hong H, Shi L, Zhang W, Zhang W, Zhang Y, Dong Z, Lancashire LJ, Bessarabova M, Yang X, Ning B, Gong B, Meehan J, Xu J, Ge W, Perkins R, Fischer M, and Tong W
- Subjects
- Algorithms, Animals, Computational Biology methods, Humans, Models, Genetic, Oligonucleotide Array Sequence Analysis, Rats, Gene Expression Profiling methods, Genetic Markers, RNA analysis, Sequence Analysis, RNA
- Abstract
Background: Gene expression microarray has been the primary biomarker platform ubiquitously applied in biomedical research, resulting in enormous data, predictive models, and biomarkers accrued. Recently, RNA-seq has looked likely to replace microarrays, but there will be a period where both technologies co-exist. This raises two important questions: Can microarray-based models and biomarkers be directly applied to RNA-seq data? Can future RNA-seq-based predictive models and biomarkers be applied to microarray data to leverage past investment?, Results: We systematically evaluated the transferability of predictive models and signature genes between microarray and RNA-seq using two large clinical data sets. The complexity of cross-platform sequence correspondence was considered in the analysis and examined using three human and two rat data sets, and three levels of mapping complexity were revealed. Three algorithms representing different modeling complexity were applied to the three levels of mappings for each of the eight binary endpoints and Cox regression was used to model survival times with expression data. In total, 240,096 predictive models were examined., Conclusions: Signature genes of predictive models are reciprocally transferable between microarray and RNA-seq data for model development, and microarray-based models can accurately predict RNA-seq-profiled samples; while RNA-seq-based models are less accurate in predicting microarray-profiled samples and are affected both by the choice of modeling algorithm and the gene mapping complexity. The results suggest continued usefulness of legacy microarray data and established microarray biomarkers and predictive models in the forthcoming RNA-seq era.
- Published
- 2014
- Full Text
- View/download PDF
46. Assessing technical performance in differential gene expression experiments with external spike-in RNA control ratio mixtures.
- Author
-
Munro SA, Lund SP, Pine PS, Binder H, Clevert DA, Conesa A, Dopazo J, Fasold M, Hochreiter S, Hong H, Jafari N, Kreil DP, Łabaj PP, Li S, Liao Y, Lin SM, Meehan J, Mason CE, Santoyo-Lopez J, Setterquist RA, Shi L, Shi W, Smyth GK, Stralis-Pavese N, Su Z, Tong W, Wang C, Wang J, Xu J, Ye Z, Yang Y, Yu Y, and Salit M
- Subjects
- Gene Expression Profiling standards, Humans, Reference Standards, Reproducibility of Results, Gene Expression Profiling methods, RNA, Messenger genetics
- Abstract
There is a critical need for standard approaches to assess, report and compare the technical performance of genome-scale differential gene expression experiments. Here we assess technical performance with a proposed standard 'dashboard' of metrics derived from analysis of external spike-in RNA control ratio mixtures. These control ratio mixtures with defined abundance ratios enable assessment of diagnostic performance of differentially expressed transcript lists, limit of detection of ratio (LODR) estimates and expression ratio variability and measurement bias. The performance metrics suite is applicable to analysis of a typical experiment, and here we also apply these metrics to evaluate technical performance among laboratories. An interlaboratory study using identical samples shared among 12 laboratories with three different measurement processes demonstrates generally consistent diagnostic power across 11 laboratories. Ratio measurement variability and bias are also comparable among laboratories for the same measurement process. We observe different biases for measurement processes using different mRNA-enrichment protocols.
- Published
- 2014
- Full Text
- View/download PDF
47. Transcriptomic profiling of rat liver samples in a comprehensive study design by RNA-Seq.
- Author
-
Gong B, Wang C, Su Z, Hong H, Thierry-Mieg J, Thierry-Mieg D, Shi L, Auerbach SS, Tong W, and Xu J
- Subjects
- Alternative Splicing, Animals, Gene Library, Rats, Gene Expression Profiling, Liver, RNA genetics, Transcriptome
- Abstract
RNA-Seq provides the capability to characterize the entire transcriptome in multiple levels including gene expression, allele specific expression, alternative splicing, fusion gene detection, and etc. The US FDA-led SEQC (i.e., MAQC-III) project conducted a comprehensive study focused on the transcriptome profiling of rat liver samples treated with 27 chemicals to evaluate the utility of RNA-Seq in safety assessment and toxicity mechanism elucidation. The chemicals represented multiple chemogenomic modes of action (MOA) and exhibited varying degrees of transcriptional response. The paired-end 100 bp sequencing data were generated using Illumina HiScanSQ and/or HiSeq 2000. In addition to the core study, six animals (i.e., three aflatoxin B1 treated rats and three vehicle control rats) were sequenced three times, with two separate library preparations on two sequencing machines. This large toxicogenomics dataset can serve as a resource to characterize various aspects of transcriptomic changes (e.g., alternative splicing) that are byproduct of chemical perturbation.
- Published
- 2014
- Full Text
- View/download PDF
48. Cross-platform ultradeep transcriptomic profiling of human reference RNA samples by RNA-Seq.
- Author
-
Xu J, Su Z, Hong H, Thierry-Mieg J, Thierry-Mieg D, Kreil DP, Mason CE, Tong W, and Shi L
- Subjects
- Humans, Quality Control, Reference Standards, Gene Expression Profiling methods, Gene Expression Profiling standards, RNA genetics, Sequence Analysis, RNA, Transcriptome
- Abstract
Whole-transcriptome sequencing ('RNA-Seq') has been drastically changing the scale and scope of genomic research. In order to fully understand the power and limitations of this technology, the US Food and Drug Administration (FDA) launched the third phase of the MicroArray Quality Control (MAQC-III) project, also known as the SEquencing Quality Control (SEQC) project. Using two well-established human reference RNA samples from the first phase of the MAQC project, three sequencing platforms were tested across more than ten sites with built-in truths including spike-in of external RNA controls (ERCC), titration data and qPCR verification. The SEQC project generated over 30 billion sequence reads representing the largest RNA-Seq data ever generated by a single project on individual RNA samples. This extraordinarily ultradeep transcriptomic data set and the known truths built into the study design provide many opportunities for further research and development to advance the improvement and application of RNA-Seq.
- Published
- 2014
- Full Text
- View/download PDF
49. Comprehensive RNA-Seq transcriptomic profiling across 11 organs, 4 ages, and 2 sexes of Fischer 344 rats.
- Author
-
Yu Y, Zhao C, Su Z, Wang C, Fuscoe JC, Tong W, and Shi L
- Subjects
- Age Factors, Animals, Female, Male, Organ Specificity, Sequence Analysis, RNA, Sex Factors, Gene Expression Profiling, RNA genetics, Rats, Inbred F344, Transcriptome
- Abstract
The rat is used extensively by the pharmaceutical, regulatory, and academic communities for safety assessment of drugs and chemicals and for studying human diseases; however, its transcriptome has not been well studied. As part of the SEQC (i.e., MAQC-III) consortium efforts, a comprehensive RNA-Seq data set was constructed using 320 RNA samples isolated from 10 organs (adrenal gland, brain, heart, kidney, liver, lung, muscle, spleen, thymus, and testes or uterus) from both sexes of Fischer 344 rats across four ages (2-, 6-, 21-, and 104-week-old) with four biological replicates for each of the 80 sample groups (organ-sex-age). With the Ribo-Zero rRNA removal and Illumina RNA-Seq protocols, 41 million 50 bp single-end reads were generated per sample, yielding a total of 13.4 billion reads. This data set could be used to identify and validate new rat genes and transcripts, develop a more comprehensive rat transcriptome annotation system, identify novel gene regulatory networks related to tissue specific gene expression and development, and discover genes responsible for disease and drug toxicity and efficacy.
- Published
- 2014
- Full Text
- View/download PDF
50. Whole genome sequencing of 35 individuals provides insights into the genetic architecture of Korean population.
- Author
-
Zhang W, Meehan J, Su Z, Ng HW, Shu M, Luo H, Ge W, Perkins R, Tong W, and Hong H
- Subjects
- Disease genetics, Gene Ontology, Genetic Association Studies, Genome, Human, High-Throughput Nucleotide Sequencing, Humans, Korea, Mutation, Sequence Alignment, Sequence Analysis, DNA, Software, Asian People genetics, Polymorphism, Single Nucleotide
- Abstract
Background: Due to a significant decline in the costs associated with next-generation sequencing, it has become possible to decipher the genetic architecture of a population by sequencing a large number of individuals to a deep coverage. The Korean Personal Genomes Project (KPGP) recently sequenced 35 Korean genomes at high coverage using the Illumina Hiseq platform and made the deep sequencing data publicly available, providing the scientific community opportunities to decipher the genetic architecture of the Korean population., Methods: In this study, we used two single nucleotide variant (SNV) calling pipelines: mapping the raw reads obtained from whole genome sequencing of 35 Korean individuals in KPGP using BWA and SOAP2 followed by SNV calling using SAMtools and SOAPsnp, respectively. The consensus SNVs obtained from the two SNV pipelines were used to represent the SNVs of the Korean population. We compared these SNVs to those from 17 other populations provided by the HapMap consortium and the 1000 Genomes Project (1KGP) and identified SNVs that were only present in the Korean population. We studied the mutation spectrum and analyzed the genes of non-synonymous SNVs only detected in the Korean population., Results: We detected a total of 8,555,726 SNVs in the 35 Korean individuals and identified 1,213,613 SNVs detected in at least one Korean individual (SNV-1) and 12,640 in all of 35 Korean individuals (SNV-35) but not in 17 other populations. In contrast with the SNVs common to other populations in HapMap and 1KGP, the Korean only SNVs had high percentages of non-silent variants, emphasizing the unique roles of these Korean only SNVs in the Korean population. Specifically, we identified 8,361 non-synonymous Korean only SNVs, of which 58 SNVs existed in all 35 Korean individuals. The 5,754 genes of non-synonymous Korean only SNVs were highly enriched in some metabolic pathways. We found adhesion is the top disease term associated with SNV-1 and Nelson syndrome is the only disease term associated with SNV-35. We found that a significant number of Korean only SNVs are in genes that are associated with the drug term of adenosine., Conclusion: We identified the SNVs that were found in the Korean population but not seen in other populations, and explored the corresponding genes and pathways as well as the associated disease terms and drug terms. The results expand our knowledge of the genetic architecture of the Korean population, which will benefit the implementation of personalized medicine for the Korean population.
- Published
- 2014
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.