417 results on '"Huixiao Hong"'
Search Results
52. Machine Learning and Deep Learning Promote Computational Toxicology for Risk Assessment of Chemicals
- Author
-
Rebecca Kusko and Huixiao Hong
- Published
- 2023
- Full Text
- View/download PDF
53. Genomic Discoveries and Personalized Medicine in Neurological Diseases
- Author
-
Li Zhang and Huixiao Hong
- Subjects
genomics ,personalized medicine ,neurological disease ,Pharmacy and materia medica ,RS1-441 - Abstract
In the past decades, we have witnessed dramatic changes in clinical diagnoses and treatments due to the revolutions of genomics and personalized medicine. Undoubtedly we also met many challenges when we use those advanced technologies in drug discovery and development. In this review, we describe when genomic information is applied in personal healthcare in general. We illustrate some case examples of genomic discoveries and promising personalized medicine applications in the area of neurological disease particular. Available data suggest that individual genomics can be applied to better treat patients in the near future.
- Published
- 2015
- Full Text
- View/download PDF
54. Alignment of Short Reads: A Crucial Step for Application of Next-Generation Sequencing Data in Precision Medicine
- Author
-
Hao Ye, Joe Meehan, Weida Tong, and Huixiao Hong
- Subjects
precision medicine ,next-generation sequencing ,genetic variants ,alignment ,short reads ,Pharmacy and materia medica ,RS1-441 - Abstract
Precision medicine or personalized medicine has been proposed as a modernized and promising medical strategy. Genetic variants of patients are the key information for implementation of precision medicine. Next-generation sequencing (NGS) is an emerging technology for deciphering genetic variants. Alignment of raw reads to a reference genome is one of the key steps in NGS data analysis. Many algorithms have been developed for alignment of short read sequences since 2008. Users have to make a decision on which alignment algorithm to use in their studies. Selection of the right alignment algorithm determines not only the alignment algorithm but also the set of suitable parameters to be used by the algorithm. Understanding these algorithms helps in selecting the appropriate alignment algorithm for different applications in precision medicine. Here, we review current available algorithms and their major strategies such as seed-and-extend and q-gram filter. We also discuss the challenges in current alignment algorithms, including alignment in multiple repeated regions, long reads alignment and alignment facilitated with known genetic variants.
- Published
- 2015
- Full Text
- View/download PDF
55. Machine Learning Methods for Predicting HLA–Peptide Binding Activity
- Author
-
Heng Luo, Hao Ye, Hui Wen Ng, Leming Shi, Weida Tong, Donna L. Mendrick, and Huixiao Hong
- Subjects
Biology (General) ,QH301-705.5 - Published
- 2015
56. Accurate Prediction and Recognition of Subfamilies of G Protein-Coupled Receptors from Amino Acid Sequences.
- Author
-
Huixiao Hong, Qilong Hong, and Weida Tong
- Published
- 2009
57. Establishing community reference samples, data and call sets for benchmarking cancer mutation detection using whole-genome sequencing
- Author
-
Kenneth Idler, Andreas Scherer, Charles Lu, Timothy K. McDaniel, Penelope Duerken-Hughes, K J. Langenbach, Seta Stanbouly, Charles Wang, Victoria Zismann, Keyur Talsania, Leming Shi, Margaret C. Cam, Shamoni Maheshwari, Zhipan Li, Luyao Ren, Petr Vojta, Mehdi Pirooznia, Jonathan J Keats, Rasika Kalamegham, Howard Jacob, Bao Tran, Liz Kerrigan, Baitang Ning, Ene Reimann, Jiri Drabek, Eric F. Donaldson, Zhaowei Yang, Sayed Mohammad Ebrahim Sahraeian, Daoud Meerzaman, Marc Sultan, Jessica Nordlund, Tsai-wei Shen, Sulev Kõks, Christopher E. Mason, Yunfei Guo, Winnie S. Liang, Claudia Catalanotti, Jeffrey M. Trent, Ying Yu, Roderick V. Jensen, Huixiao Hong, Malcolm Moos, Wenming Xiao, Stephen T. Sherry, Jonathan Foox, Joe Shuga, Hugo Y. K. Lam, Chunlin Xiao, Lijing Yao, Li Tai Fang, Wanqiu Chen, Marghoob Mohiyuddin, Monika Mehta, Rebecca Kusko, Roberta Maestro, Yongmei Zhao, Jonathan Adkins, Gary P. Schroth, Daniel Butler, Yuliya Kriga, Ogan D Abaan, Erich Jaeger, Yuanting Zheng, Daniela Gasparotto, Ulrika Liljedahl, Tiffany Hung, Eric Peters, Erica Tassone, Maryellen de Mars, Cu Nguyen, Lei Song, Bin Zhu, Weida Tong, Zivana Tezak, Justin B. Lack, Virginie Petitjean, Jyoti Shetty, Jing Li, and Zhong Chen
- Subjects
DNA Mutational Analysis ,Biomedical Engineering ,Datasets as Topic ,Breast Neoplasms ,Bioengineering ,Genomics ,Computational biology ,Biology ,Applied Microbiology and Biotechnology ,Somatic evolution in cancer ,Genome ,Article ,Germline ,Cell Line, Tumor ,medicine ,Humans ,Whole genome sequencing ,Whole Genome Sequencing ,High-Throughput Nucleotide Sequencing ,Reproducibility of Results ,Cancer ,Benchmarking ,Reference Standards ,medicine.disease ,genomic DNA ,Germ Cells ,Mutation ,Molecular Medicine ,Biotechnology - Abstract
The lack of samples for generating standardized DNA datasets for setting up a sequencing pipeline or benchmarking the performance of different algorithms limits the implementation and uptake of cancer genomics. Here, we describe reference call sets obtained from paired tumor–normal genomic DNA (gDNA) samples derived from a breast cancer cell line—which is highly heterogeneous, with an aneuploid genome, and enriched in somatic alterations—and a matched lymphoblastoid cell line. We partially validated both somatic mutations and germline variants in these call sets via whole-exome sequencing (WES) with different sequencing platforms and targeted sequencing with >2,000-fold coverage, spanning 82% of genomic regions with high confidence. Although the gDNA reference samples are not representative of primary cancer cells from a clinical sample, when setting up a sequencing pipeline, they not only minimize potential biases from technologies, assays and informatics but also provide a unique resource for benchmarking ‘tumor-only’ or ‘matched tumor–normal’ analyses.
- Published
- 2021
- Full Text
- View/download PDF
58. Toward best practice in cancer mutation detection with whole-genome and whole-exome sequencing
- Author
-
Maryellen de Mars, Cu Nguyen, Tiffany Hung, Eric Peters, Charles Lu, Meijian Guan, Bao Tran, Maurizio Polano, Bin Zhu, Samir Lababidi, Wendell D. Jones, Chunlin Xiao, Andreas Scherer, K J. Langenbach, Zhipan Li, Luyao Ren, Weida Tong, Erich Jaeger, Rebecca Kusko, Zivana Tezak, Ying Yu, Ulrika Liljedahl, Louis M. Staudt, Huixiao Hong, Jing Wang, Yuanting Zheng, Ali Moshrefi, Cristobal Juan Vera, Chris Miller, Rasika Kalamegham, Arati Raziuddin, Howard Jacob, Roberta Maestro, Bindu Swapna Madala, Petr Vojta, Jessica Nordlund, Li Tai Fang, Jiri Drabek, Xuelu Liu, Corey Miles, Gary P. Schroth, Fayaz Seifuddin, Tim R. Mercer, Chunhua Yan, Leihong Wu, Sulev Kõks, Roderick V. Jensen, Jennifer A Hipp, Yun-Ching Chen, Malcolm Moos, Yongmei Zhao, Baitang Ning, Aparna Natarajan, Brian N. Papas, Xin Chen, Ashley Walton, Stephen T. Sherry, Christopher E. Mason, Liz Kerrigan, Ogan D Abaan, Wanqiu Chen, Kenneth Idler, Jingya Wang, Tsai-wei Shen, James C. Willey, Ene Reimann, Justin B. Lack, Virginie Petitjean, Jyoti Shetty, Daoud Meerzaman, Charles Wang, Jian-Liang Li, Tiffany Truong, Keyur Talsania, Mehdi Pirooznia, Marc Sultan, Urvashi Mehra, Wenming Xiao, Zhong Chen, Ana Granat, Leming Shi, Margaret C. Cam, Qing-Rong Chen, Eric F. Donaldson, Wolfgang Resch, Ben Ernest, Yuliya Kriga, Gokhan Yavas, Thomas M. Blomquist, and Parthav Jailwala
- Subjects
Computer science ,Sequence analysis ,Biomedical Engineering ,Bioengineering ,Computational biology ,Applied Microbiology and Biotechnology ,Genome ,Article ,Cell Line ,Cell Line, Tumor ,Neoplasms ,Exome Sequencing ,medicine ,Humans ,Mutation detection ,Exome sequencing ,Protocol (science) ,Reproducibility ,Whole Genome Sequencing ,High-Throughput Nucleotide Sequencing ,Reproducibility of Results ,Cancer ,Sequence Analysis, DNA ,medicine.disease ,Benchmarking ,Mutation ,Mutation (genetic algorithm) ,Molecular Medicine ,Biotechnology - Abstract
Clinical applications of precision oncology require accurate tests that can distinguish true cancer-specific mutations from errors introduced at each step of next-generation sequencing (NGS). To date, no bulk sequencing study has addressed the effects of cross-site reproducibility, nor the biological, technical and computational factors that influence variant identification. Here we report a systematic interrogation of somatic mutations in paired tumor–normal cell lines to identify factors affecting detection reproducibility and accuracy at six different centers. Using whole-genome sequencing (WGS) and whole-exome sequencing (WES), we evaluated the reproducibility of different sample types with varying input amount and tumor purity, and multiple library construction protocols, followed by processing with nine bioinformatics pipelines. We found that read coverage and callers affected both WGS and WES reproducibility, but WES performance was influenced by insert fragment size, genomic copy content and the global imbalance score (GIV; G > T/C > A). Finally, taking into account library preparation protocol, tumor content, read coverage and bioinformatics processes concomitantly, we recommend actionable practices to improve the reproducibility and accuracy of NGS experiments for cancer mutation detection.
- Published
- 2021
- Full Text
- View/download PDF
59. Quartet RNA reference materials and ratio-based reference datasets for reliable transcriptomic profiling
- Author
-
Ying Yu, Wanwan Hou, Haiyan Wang, Lianhua Dong, Yaqing Liu, Shanyue Sun, Jingcheng Yang, Zehui Cao, Peipei Zhang, Yi Zi, Zhihui Li, Ruimei Liu, Jian Gao, Qingwang Chen, Naixin Zhang, Jingjing Li, Luyao Ren, He Jiang, Jun Shang, Sibo Zhu, Xiaolin Wang, Tao Qing, Ding Bao, Bingying Li, Bin Li, Chen Suo, Yan Pi, Xia Wang, Fangping Dai, Andreas Scherer, Pirkko Mattila, Jinxiong Han, Lijun Zhang, Hui Jiang, Danielle Thierry-Mieg, Jean Thierry-Mieg, Wenming Xiao, Huixiao Hong, Weida Tong, Jing Wang, Jinming Li, Xiang Fang, Li Jin, Joshua Xu, Feng Qian, Rui Zhang, Yuanting Zheng, and Leming Shi
- Abstract
As an indispensable tool for transcriptome-wide analysis of differential gene expression, RNA sequencing (RNAseq) has demonstrated great potential in clinical applications such as companion diagnostics and prognostics. However, there is a lack of certified RNA reference materials and the corresponding reference datasets of differential expression for assessing the reliability of RNAseq for its intended use in detecting intrinsically small biological differences in clinical settings such as molecular subtyping of diseases. As part of the Quartet Project for quality control and data integration of multiomics profiling, we established four RNA reference materials derived from immortalized B-lymphoblastoid cell lines from four members of a monozygotic twin family. Additionally, we constructed ratio-based transcriptome-wide reference datasets between two reference samples, providing "ground truth" for cross-platform and cross-lab proficiency test. Moreover, Quartet-sample-based quality metrics were developed for assessing the reliability of RNAseq technology in terms of intra-batch measurement and cross-batch data integration. The small intrinsic biological differences among the Quartet samples enabled sensitive assessment of performance of transcriptomic measurements and their cross-batch integration at the ratio level. The Quartet RNA reference materials combined with the ratio-based reference datasets can serve as unique resources for assessing data quality and improving reliability of transcriptomic profiling.
- Published
- 2022
- Full Text
- View/download PDF
60. Ratio-based quantitative multiomics profiling using universal reference materials empowers data integration
- Author
-
Yuanting Zheng, Yaqing Liu, Jingcheng Yang, Lianhua Dong, Rui Zhang, Sha Tian, Ying Yu, Luyao Ren, Wanwan Hou, Feng Zhu, Yuanbang Mai, Jinxiong Han, Lijun Zhang, Hui Jiang, Ling Lin, Jingwei Lou, Ruiqiang Li, Jingchao Lin, Huafen Liu, Ziqing Kong, Depeng Wang, Fangping Dai, Ding Bao, Zehui Cao, Qiaochu Chen, Qingwang Chen, Xingdong Chen, Yuechen Gao, He Jiang, Bin Li, Bingying Li, Jingjing Li, Ruimei Liu, Tao Qing, Erfei Shang, Jun Shang, Shanyue Sun, Haiyan Wang, Xiaolin Wang, Naixin Zhang, Peipei Zhang, Ruolan Zhang, Sibo Zhu, Andreas Scherer, Jiucun Wang, Jing Wang, Joshua Xu, Huixiao Hong, Wenming Xiao, Xiaozhen Liang, Li Jin, Weida Tong, Chen Ding, Jinming Li, Xiang Fang, and Leming Shi
- Abstract
Multiomics profiling is a powerful tool to characterize the same samples with complementary features orchestrating the genome, epigenome, transcriptome, proteome, and metabolome. However, the lack of ground truth hampers the objective assessment of and subsequent choice from a plethora of measurement and computational methods aiming to integrate diverse and often enigmatically incomparable omics datasets. Here we establish and characterize the first suites of publicly available multiomics reference materials of matched DNA, RNA, proteins, and metabolites derived from immortalized cell lines from a family quartet of parents and monozygotic twin daughters, providing built-in truth defined by family relationship and the central dogma. We demonstrate that the "ratio"-based omics profiling data, i.e., by scaling the absolute feature values of a study sample relative to those of a concurrently measured universal reference sample, were inherently much more reproducible and comparable across batches, labs, platforms, and omics types, thus empower the horizontal (within-omics) and vertical (cross-omics) data integration in multiomics studies. Our study identifies "absolute" feature quantitation as the root cause of irreproducibility in multiomics measurement and data integration, and urges a paradigm shift from "absolute" to "ratio"-based multiomics profiling with universal reference materials.
- Published
- 2022
- Full Text
- View/download PDF
61. Correcting batch effects in large-scale multiomic studies using a reference-material-based ratio method
- Author
-
Ying Yu, Naixin Zhang, Yuanbang Mai, Qiaochu Chen, Zehui Cao, Qingwang Chen, Yaqing Liu, Luyao Ren, Wanwan Hou, Jingcheng Yang, Huixiao Hong, Joshua Xu, Weida Tong, Leming Shi, and Yuanting Zheng
- Abstract
Batch effects are notorious technical variations that are common in multiomic data and may result in misleading outcomes. With the era of big data, tackling batch effects in multiomic integration is urgently needed. As part of the Quartet Project for quality control and data integration of multiomic profiling, we comprehensively assess the performances of seven batch-effect correction algorithms (BECAs) for mitigating the negative impact of batch effects in multiomic datasets, including transcriptomics, proteomics, and metabolomics. Performances are evaluated based on accuracy of identifying differentially expressed features, robustness of predictive models, and the ability of accurately clustering cross-batch samples into their biological sample groups. Ratio-based method is more effective and widely applicable than others, especially in cases when batch effects are highly confounded with biological factors of interests. We further provide practical guidelines for the implementation of ratio-based method using universal reference materials profiled with study samples. Our findings show the promise for eliminating batch effects and enhancing data integration in increasingly large-scale, cross-batch multiomic studies.
- Published
- 2022
- Full Text
- View/download PDF
62. The Quartet Data Portal: integration of community-wide resources for multiomics quality control
- Author
-
Jingcheng Yang, Yaqing Liu, Jun Shang, Qiaochu Chen, Qingwang Chen, Luyao Ren, Naixin Zhang, Ying Yu, Zhihui Li, Yueqiang Song, Shengpeng Yang, Andreas Scherer, Weida Tong, Huixiao Hong, Leming Shi, Wenming Xiao, and Yuanting Zheng
- Abstract
The implementation of quality control for multiomic data requires the widespread use of well-characterized reference materials, reference datasets, and related resources. The Quartet Data Portal was built to facilitate community access to such rich resources established in the Quartet Project. A convenient platform is provided for users to request the DNA, RNA, protein, and metabolite reference materials, as well as multi-level datasets generated across omics, platforms, labs, protocols, and batches. Interactive visualization tools are offered to assist users to gain a quick understanding of the reference datasets. Crucially, the Quartet Data Portal continuously collects, evaluates, and integrates the community-generated data of the distributed Quartet multiomic reference materials. In addition, the portal provides analysis pipelines to assess the quality of user-submitted multiomic data. Furthermore, the reference datasets, performance metrics, and analysis pipelines will be improved through periodic review and integration of multiomic data submitted by the community. Effective integration of the evolving technologies via active interactions with the community will help ensure the reliability of multiomics-based biological discoveries. The Quartet Data Portal is accessible athttps://chinese-quartet.org.Graphical Abstract
- Published
- 2022
- Full Text
- View/download PDF
63. Informing selection of drugs for COVID-19 treatment through adverse events analysis
- Author
-
Tucker A. Patterson, Huixiao Hong, Wenjing Guo, Takashi E. Komatsu, Yanhui Lu, Bohu Pan, Gokhan Yavas, Weida Tong, Madhu Lal-Nag, Sugunadevi Sakkiah, and Zuowei Ji
- Subjects
0301 basic medicine ,Drug ,medicine.medical_specialty ,2019-20 coronavirus outbreak ,Databases, Factual ,Coronavirus disease 2019 (COVID-19) ,media_common.quotation_subject ,Science ,MEDLINE ,Article ,Databases ,03 medical and health sciences ,0302 clinical medicine ,Pandemic ,Product Surveillance, Postmarketing ,Adverse Drug Reaction Reporting Systems ,Humans ,Medicine ,Drug safety ,Intensive care medicine ,Adverse effect ,Data mining ,media_common ,Clinical Trials as Topic ,Multidisciplinary ,business.industry ,COVID-19 Drug Treatment ,Data processing ,Clinical trial ,030104 developmental biology ,Investigational Drugs ,Safety ,business ,030217 neurology & neurosurgery - Abstract
Coronavirus disease 2019 (COVID-19) is an ongoing pandemic and there is an urgent need for safe and effective drugs for COVID-19 treatment. Since developing a new drug is time consuming, many approved or investigational drugs have been repurposed for COVID-19 treatment in clinical trials. Therefore, selection of safe drugs for COVID-19 patients is vital for combating this pandemic. Our goal was to evaluate the safety concerns of drugs by analyzing adverse events reported in post-market surveillance. We collected 296 drugs that have been evaluated in clinical trials for COVID-19 and identified 28,597,464 associated adverse events at the system organ classes (SOCs) level in the FDA adverse events report systems (FAERS). We calculated Z-scores of SOCs that statistically quantify the relative frequency of adverse events of drugs in FAERS to quantitatively measure safety concerns for the drugs. Analyzing the Z-scores revealed that these drugs are associated with different significantly frequent adverse events. Our results suggest that this safety concern metric may serve as a tool to inform selection of drugs with favorable safety profiles for COVID-19 patients in clinical practices. Caution is advised when administering drugs with high Z-scores to patients who are vulnerable to associated adverse events.
- Published
- 2021
64. Correction to: Similarities and differences between variants called with human reference genome HG19 or HG38
- Author
-
Bohu Pan, Rebecca Kusko, Wenming Xiao, Yuanting Zheng, Zhichao Liu, Chunlin Xiao, Sugunadevi Sakkiah, Wenjing Guo, Ping Gong, Chaoyang Zhang, Weigong Ge, Leming Shi, Weida Tong, and Huixiao Hong
- Subjects
Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
After publication of this supplement article
- Published
- 2019
- Full Text
- View/download PDF
65. Estimating relative noise to signal in DNA microarray data.
- Author
-
Huixiao Hong, Qilong Hong, Jie Liu, Weida Tong, and Leming Shi
- Published
- 2013
- Full Text
- View/download PDF
66. BPA Replacement Compounds: Current Status and Perspectives
- Author
-
Sugunadevi Sakkiah, Jie Liu, Wenjing Guo, Zuowei Ji, and Huixiao Hong
- Subjects
Renewable Energy, Sustainability and the Environment ,business.industry ,General Chemical Engineering ,Physiology ,02 engineering and technology ,General Chemistry ,010402 general chemistry ,021001 nanoscience & nanotechnology ,01 natural sciences ,0104 chemical sciences ,Human health ,Environmental Chemistry ,Medicine ,0210 nano-technology ,business - Abstract
Since growing evidence has manifested that bisphenol A (BPA) may adversely affect human health, numerous BPA replacement compounds have been gradually introduced into the industry. Although BPA rep...
- Published
- 2021
- Full Text
- View/download PDF
67. The Accurate Prediction of Protein Family from Amino Acid Sequence by Measuring Features of Sequence Fragments.
- Author
-
Huixiao Hong, Qilong Hong, Roger Perkins, Leming M. Shi, Hong Fang, Zhenqiang Su, Yvonne P. Dragan, James C. Fuscoe, and Weida Tong
- Published
- 2009
- Full Text
- View/download PDF
68. A comprehensive rat transcriptome built from large scale RNA-seq-based annotation
- Author
-
Yongxiang Zhao, Zhichao Liu, Jian He, Tieliu Shi, James C. Fuscoe, Xiangjun Ji, Jinghua Liu, Paweł P. Łabaj, Wenjun Bao, David P. Kreil, Geng Chen, Leming Shi, Peng Li, Liping Zhong, Baitang Ning, Wenzhong Xiao, Jun Wu, Weida Tong, Yong Huang, Huixiao Hong, and Lei Guo
- Subjects
Gene isoform ,AcademicSubjects/SCI00010 ,ved/biology.organism_classification_rank.species ,RNA-Seq ,Computational biology ,Data Resources and Analyses ,Biology ,Genome ,Transcriptome ,03 medical and health sciences ,Exon ,0302 clinical medicine ,Gene expression ,Exome Sequencing ,Genetics ,Animals ,Humans ,Model organism ,Gene ,030304 developmental biology ,0303 health sciences ,ved/biology ,High-Throughput Nucleotide Sequencing ,Molecular Sequence Annotation ,Rats ,Alternative Splicing ,030217 neurology & neurosurgery - Abstract
The rat is an important model organism in biomedical research for studying human disease mechanisms and treatments, but its annotated transcriptome is far from complete. We constructed a Rat Transcriptome Re-annotation named RTR using RNA-seq data from 320 samples in 11 different organs generated by the SEQC consortium. Totally, there are 52 807 genes and 114 152 transcripts in RTR. Transcribed regions and exons in RTR account for ∼42% and ∼6.5% of the genome, respectively. Of all 73 074 newly annotated transcripts in RTR, 34 213 were annotated as high confident coding transcripts and 24 728 as high confident long noncoding transcripts. Different tissues rather than different stages have a significant influence on the expression patterns of transcripts. We also found that 11 715 genes and 15 852 transcripts were expressed in all 11 tissues and that 849 house-keeping genes expressed different isoforms among tissues. This comprehensive transcriptome is freely available at http://www.unimd.org/rtr/. Our new rat transcriptome provides essential reference for genetics and gene expression studies in rat disease and toxicity models.
- Published
- 2020
69. Modeling Chemical Interaction Profiles: II. Molecular Docking, Spectral Data-Activity Relationship, and Structure-Activity Relationship Models for Potent and Weak Inhibitors of Cytochrome P450 CYP3A4 Isozyme
- Author
-
Eugene Demchuk, Bruce A. Fowler, James C. Fuscoe, Richard D. Beger, Weida Tong, Jon G. Wilkes, Dan A. Buzatu, Weigong Ge, Laura K. Schnackenberg, Yunfeng Tie, Brooks McPhail, Huixiao Hong, and Bruce A. Pearce
- Subjects
structure-activity relationship ,SAR ,QSAR ,SDAR ,docking ,molecular modeling ,inhibitor ,CYP3A4 ,drug-drug interaction ,drug-chemical interaction ,DDI ,DDCI ,Organic chemistry ,QD241-441 - Abstract
Polypharmacy increasingly has become a topic of public health concern, particularly as the U.S. population ages. Drug labels often contain insufficient information to enable the clinician to safely use multiple drugs. Because many of the drugs are bio-transformed by cytochrome P450 (CYP) enzymes, inhibition of CYP activity has long been associated with potentially adverse health effects. In an attempt to reduce the uncertainty pertaining to CYP-mediated drug-drug/chemical interactions, an interagency collaborative group developed a consensus approach to prioritizing information concerning CYP inhibition. The consensus involved computational molecular docking, spectral data-activity relationship (SDAR), and structure-activity relationship (SAR) models that addressed the clinical potency of CYP inhibition. The models were built upon chemicals that were categorized as either potent or weak inhibitors of the CYP3A4 isozyme. The categorization was carried out using information from clinical trials because currently available in vitro high-throughput screening data were not fully representative of the in vivo potency of inhibition. During categorization it was found that compounds, which break the Lipinski rule of five by molecular weight, were about twice more likely to be inhibitors of CYP3A4 compared to those, which obey the rule. Similarly, among inhibitors that break the rule, potent inhibitors were 2–3 times more frequent. The molecular docking classification relied on logistic regression, by which the docking scores from different docking algorithms, CYP3A4 three-dimensional structures, and binding sites on them were combined in a unified probabilistic model. The SDAR models employed a multiple linear regression approach applied to binned 1D 13C-NMR and 1D 15N-NMR spectral descriptors. Structure-based and physical-chemical descriptors were used as the basis for developing SAR models by the decision forest method. Thirty-three potent inhibitors and 88 weak inhibitors of CYP3A4 were used to train the models. Using these models, a synthetic majority rules consensus classifier was implemented, while the confidence of estimation was assigned following the percent agreement strategy. The classifier was applied to a testing set of 120 inhibitors not included in the development of the models. Five compounds of the test set, including known strong inhibitors dalfopristin and tioconazole, were classified as probable potent inhibitors of CYP3A4. Other known strong inhibitors, such as lopinavir, oltipraz, quercetin, raloxifene, and troglitazone, were among 18 compounds classified as plausible potent inhibitors of CYP3A4. The consensus estimation of inhibition potency is expected to aid in the nomination of pharmaceuticals, dietary supplements, environmental pollutants, and occupational and other chemicals for in-depth evaluation of the CYP3A4 inhibitory activity. It may serve also as an estimate of chemical interactions via CYP3A4 metabolic pharmacokinetic pathways occurring through polypharmacy and nutritional and environmental exposures to chemical mixtures.
- Published
- 2012
- Full Text
- View/download PDF
70. Modeling Chemical Interaction Profiles: I. Spectral Data-Activity Relationship and Structure-Activity Relationship Models for Inhibitors and Non-inhibitors of Cytochrome P450 CYP3A4 and CYP2D6 Isozymes
- Author
-
Richard D. Beger, Eugene Demchuk, Bruce A. Fowler, Dan A. Buzatu, Jon G. Wilkes, Weida Tong, James C. Fuscoe, Luis G. Valerio, Huixiao Hong, Weigong Ge, Laura K. Schnackenberg, Bruce A. Pearce, Yunfeng Tie, and Brooks McPhail
- Subjects
structure-activity relationship ,SAR ,SDAR ,classifier ,cytochrome P450 ,inhibitor ,CYP3A4 ,CYP2D6 ,Organic chemistry ,QD241-441 - Abstract
An interagency collaboration was established to model chemical interactions that may cause adverse health effects when an exposure to a mixture of chemicals occurs. Many of these chemicals—drugs, pesticides, and environmental pollutants—interact at the level of metabolic biotransformations mediated by cytochrome P450 (CYP) enzymes. In the present work, spectral data-activity relationship (SDAR) and structure-activity relationship (SAR) approaches were used to develop machine-learning classifiers of inhibitors and non-inhibitors of the CYP3A4 and CYP2D6 isozymes. The models were built upon 602 reference pharmaceutical compounds whose interactions have been deduced from clinical data, and 100 additional chemicals that were used to evaluate model performance in an external validation (EV) test. SDAR is an innovative modeling approach that relies on discriminant analysis applied to binned nuclear magnetic resonance (NMR) spectral descriptors. In the present work, both 1D 13C and 1D 15N-NMR spectra were used together in a novel implementation of the SDAR technique. It was found that increasing the binning size of 1D 13C-NMR and 15N-NMR spectra caused an increase in the tenfold cross-validation (CV) performance in terms of both the rate of correct classification and sensitivity. The results of SDAR modeling were verified using SAR. For SAR modeling, a decision forest approach involving from 6 to 17 Mold2 descriptors in a tree was used. Average rates of correct classification of SDAR and SAR models in a hundred CV tests were 60% and 61% for CYP3A4, and 62% and 70% for CYP2D6, respectively. The rates of correct classification of SDAR and SAR models in the EV test were 73% and 86% for CYP3A4, and 76% and 90% for CYP2D6, respectively. Thus, both SDAR and SAR methods demonstrated a comparable performance in modeling a large set of structurally diverse data. Based on unique NMR structural descriptors, the new SDAR modeling method complements the existing SAR techniques, providing an independent estimator that can increase confidence in a structure-activity assessment. When modeling was applied to hazardous environmental chemicals, it was found that up to 20% of them may be substrates and up to 10% of them may be inhibitors of the CYP3A4 and CYP2D6 isoforms. The developed models provide a rare opportunity for the environmental health branch of the public health service to extrapolate to hazardous chemicals directly from human clinical data. Therefore, the pharmacological and environmental health branches are both expected to benefit from these reported models.
- Published
- 2012
- Full Text
- View/download PDF
71. Mold2, Molecular Descriptors from 2D Structures for Chemoinformatics and Toxicoinformatics.
- Author
-
Huixiao Hong, Qian Xie 0004, Weigong Ge, Feng Qian, Hong Fang, Leming Shi, Zhenqiang Su, Roger Perkins, and Weida Tong
- Published
- 2008
- Full Text
- View/download PDF
72. Consensus analysis of multiple classifiers using non-repetitive variables: Diagnostic application to microarray gene expression data.
- Author
-
Zhenqiang Su, Huixiao Hong, Roger Perkins, Xueguang Shao, Wensheng Cai, and Weida Tong
- Published
- 2007
- Full Text
- View/download PDF
73. Development of Nicotinic Acetylcholine Receptor nAChR α7 Binding Activity Prediction Model: Coupling Machine Learning with Competitive Molecular Docking.
- Author
-
Huixiao Hong, Carmine Leggett, and Suguna Sakkiah
- Published
- 2017
- Full Text
- View/download PDF
74. Spec2D: A Structure Elucidation System Based on 1H NMR and H-H COSY Spectra in Organic Chemistry.
- Author
-
Hideyuki Masui and Huixiao Hong
- Published
- 2006
- Full Text
- View/download PDF
75. Machine Learning Models for Predicting Liver Toxicity
- Author
-
Jie, Liu, Wenjing, Guo, Sugunadevi, Sakkiah, Zuowei, Ji, Gokhan, Yavas, Wen, Zou, Minjun, Chen, Weida, Tong, Tucker A, Patterson, and Huixiao, Hong
- Subjects
Machine Learning ,Drug-Related Side Effects and Adverse Reactions ,Drug Discovery ,Animals ,Humans ,Hepatitis - Abstract
Liver toxicity is a major adverse drug reaction that accounts for drug failure in clinical trials and withdrawal from the market. Therefore, predicting potential liver toxicity at an early stage in drug discovery is crucial to reduce costs and the potential for drug failure. However, current in vivo animal toxicity testing is very expensive and time consuming. As an alternative approach, various machine learning models have been developed to predict potential liver toxicity in humans. This chapter reviews current advances in the development and application of machine learning models for prediction of potential liver toxicity in humans and discusses possible improvements to liver toxicity prediction.
- Published
- 2022
76. Machine Learning Models for Predicting Cytotoxicity of Nanomaterials
- Author
-
Zuowei Ji, Wenjing Guo, Erin L. Wood, Jie Liu, Sugunadevi Sakkiah, Xiaoming Xu, Tucker A. Patterson, and Huixiao Hong
- Subjects
Machine Learning ,Cell Survival ,Humans ,General Medicine ,Toxicology ,Cell Line ,Nanostructures - Abstract
The wide application of nanomaterials in consumer and medical products has raised concerns about their potential adverse effects on human health. Thus, more and more biological assessments regarding the toxicity of nanomaterials have been performed. However, the different ways the evaluations were performed, such as the utilized assays, cell lines, and the differences of the produced nanoparticles, make it difficult for scientists to analyze and effectively compare toxicities of nanomaterials. Fortunately, machine learning has emerged as a powerful tool for the prediction of nanotoxicity based on the available data. Among different types of toxicity assessments, nanomaterial cytotoxicity was the focus here because of the high sensitivity of cytotoxicity assessment to different treatments without the need for complicated and time-consuming procedures. In this review, we summarized recent studies that focused on the development of machine learning models for prediction of cytotoxicity of nanomaterials. The goal was to provide insight into predicting potential nanomaterial toxicity and promoting the development of safe nanomaterials.
- Published
- 2022
77. Machine Learning Models for Predicting Liver Toxicity
- Author
-
Jie Liu, Wenjing Guo, Sugunadevi Sakkiah, Zuowei Ji, Gokhan Yavas, Wen Zou, Minjun Chen, Weida Tong, Tucker A. Patterson, and Huixiao Hong
- Published
- 2022
- Full Text
- View/download PDF
78. Structures of Endocrine-Disrupting Chemicals Correlate with the Activation of 12 Classic Nuclear Receptors
- Author
-
Tan Haoyue, Giuseppina Gini, Huixiao Hong, Chen Qinchang, Xiaowei Zhang, Hongxia Yu, Wei Shi, and Emilio Benfenati
- Subjects
Virtual screening ,Thyroid hormone receptor ,medicine.drug_class ,Retinoic acid ,Receptors, Cytoplasmic and Nuclear ,General Chemistry ,Computational biology ,Endocrine Disruptors ,Androgen ,Molecular Docking Simulation ,chemistry.chemical_compound ,Nuclear receptor ,chemistry ,Estrogen ,medicine ,Environmental Chemistry ,Endocrine system ,Receptor - Abstract
Endocrine-disrupting chemicals (EDCs) can inadvertently interact with 12 classic nuclear receptors (NRs) that disrupt the endocrine system and cause adverse effects. There is no widely accepted understanding about what structural features make thousands of EDCs able to activate different NRs as well as how these structural features exert their functions and induce different outcomes at the cellular level. This paper applies the hierarchical characteristic fragment methodology and high-throughput screening molecular docking to comprehensively explore the structural and functional features of EDCs for the 12 NRs based on more than 7000 chemicals from curated datasets. EDCs share three levels of key fragments. The primary and secondary fragments are associated with the binding of EDCs to four groups of receptors: steroidal nuclear receptors (SNRs, including androgen, estrogen, glucocorticoid, mineralocorticoid, and progesterone), retinoic acid receptors, thyroid hormone receptors, and vitamin D receptors. The tertiary fragments determine the activity type by interacting with two key locations in the ligand-binding domains of NRs (N-H5-H3-C and N-H7-H11-C for SNRs and N-H5-H5'-H2'-H3-C and N-H6'-H11-C for non-SNRs). The resulting compiled structural fragments of EDCs together with elucidated compound NR binding modes provide a framework for understanding the interactions between EDCs and NRs, facilitating faster and more accurate screening of EDCs for multiple NRs in the future.
- Published
- 2021
79. Distinct Conformations of SARS-CoV-2 Omicron Spike Protein and Its Interaction with ACE2 and Antibody
- Author
-
Myeongsang Lee, Marian Major, and Huixiao Hong
- Subjects
Inorganic Chemistry ,Organic Chemistry ,General Medicine ,Physical and Theoretical Chemistry ,Molecular Biology ,Spectroscopy ,Catalysis ,Computer Science Applications - Abstract
Since November 2021, Omicron has been the dominant severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variant that causes the coronavirus disease 2019 (COVID-19) and has continuously impacted human health. Omicron sublineages are still increasing and cause increased transmission and infection rates. The additional 15 mutations on the receptor binding domain (RBD) of Omicron spike proteins change the protein conformation, enabling the Omicron variant to evade neutralizing antibodies. For this reason, many efforts have been made to design new antigenic variants to induce effective antibodies in SARS-CoV-2 vaccine development. However, understanding the different states of Omicron spike proteins with and without external molecules has not yet been addressed. In this review, we analyze the structures of the spike protein in the presence and absence of angiotensin-converting enzyme 2 (ACE2) and antibodies. Compared to previously determined structures for the wildtype spike protein and other variants such as alpha, beta, delta, and gamma, the Omicron spike protein adopts a partially open form. The open-form spike protein with one RBD up is dominant, followed by the open-form spike protein with two RBD up, and the closed-form spike protein with the RBD down. It is suggested that the competition between antibodies and ACE2 induces interactions between adjacent RBDs of the spike protein, which lead to a partially open form of the Omicron spike protein. The comprehensive structural information of Omicron spike proteins could be helpful for the efficient design of vaccines against the Omicron variant.
- Published
- 2023
- Full Text
- View/download PDF
80. Decision Forest: Combining the Predictions of Multiple Independent Decision Tree Models.
- Author
-
Weida Tong, Huixiao Hong, Hong Fang, Qian Xie 0004, and Roger Perkins
- Published
- 2003
- Full Text
- View/download PDF
81. Assessing reproducibility of inherited variants detected with short-read whole genome sequencing
- Author
-
Bohu Pan, Luyao Ren, Vitor Onuchic, Meijian Guan, Rebecca Kusko, Steve Bruinsma, Len Trigg, Andreas Scherer, Baitang Ning, Chaoyang Zhang, Christine Glidewell-Kenney, Chunlin Xiao, Eric Donaldson, Fritz J. Sedlazeck, Gary Schroth, Gokhan Yavas, Haiying Grunenwald, Haodong Chen, Heather Meinholz, Joe Meehan, Jing Wang, Jingcheng Yang, Jonathan Foox, Jun Shang, Kelci Miclaus, Lianhua Dong, Leming Shi, Marghoob Mohiyuddin, Mehdi Pirooznia, Ping Gong, Rooz Golshani, Russ Wolfinger, Samir Lababidi, Sayed Mohammad Ebrahim Sahraeian, Steve Sherry, Tao Han, Tao Chen, Tieliu Shi, Wanwan Hou, Weigong Ge, Wen Zou, Wenjing Guo, Wenjun Bao, Wenzhong Xiao, Xiaohui Fan, Yoichi Gondo, Ying Yu, Yongmei Zhao, Zhenqiang Su, Zhichao Liu, Weida Tong, Wenming Xiao, Justin M. Zook, Yuanting Zheng, Huixiao Hong, and Institute for Molecular Medicine Finland
- Subjects
11832 Microbiology and virology ,0303 health sciences ,318 Medical biotechnology ,Whole Genome Sequencing ,QH301-705.5 ,Genome, Human ,Research ,1184 Genetics, developmental biology, physiology ,High-Throughput Nucleotide Sequencing ,Reproducibility of Results ,QH426-470 ,CANCER ,Polymorphism, Single Nucleotide ,GENOTYPE ,3. Good health ,03 medical and health sciences ,0302 clinical medicine ,DATA SETS ,INDEL Mutation ,030220 oncology & carcinogenesis ,Genetics ,Humans ,Biology (General) ,MUTATION ,030304 developmental biology - Abstract
BackgroundReproducible detection of inherited variants with whole genome sequencing (WGS) is vital for the implementation of precision medicine and is a complicated process in which each step affects variant call quality. Systematically assessing reproducibility of inherited variants with WGS and impact of each step in the process is needed for understanding and improving quality of inherited variants from WGS.ResultsTo dissect the impact of factors involved in detection of inherited variants with WGS, we sequence triplicates of eight DNA samples representing two populations on three short-read sequencing platforms using three library kits in six labs and call variants with 56 combinations of aligners and callers. We find that bioinformatics pipelines (callers and aligners) have a larger impact on variant reproducibility than WGS platform or library preparation. Single-nucleotide variants (SNVs), particularly outside difficult-to-map regions, are more reproducible than small insertions and deletions (indels), which are least reproducible when > 5 bp. Increasing sequencing coverage improves indel reproducibility but has limited impact on SNVs above 30×.ConclusionsOur findings highlight sources of variability in variant detection and the need for improvement of bioinformatics pipelines in the era of precision medicine with WGS.
- Published
- 2021
82. Whole genome and exome sequencing reference datasets from a multi-center and cross-platform benchmark study
- Author
-
Bao Tran, Erich Jaeger, Sulbha Choudhari, Daniela Gasparotto, Yuliya Kriga, Sulev Kõks, Kenneth Idler, Keyur Talsania, Petr Vojta, Zhong Chen, Charles Wang, Jiri Drabek, Wanqiu Chen, Yuanting Zheng, Daoud Meerzaman, Christopher E. Mason, Roberta Maestro, Leming Shi, Ene Reimann, Tsai-wei Shen, Charles Lu, Jonathan Foox, Xiongfong Chen, Chunlin Xiao, Luyao Ren, Wenming Xiao, Tiffany Hung, Eric Peters, Marc Sultan, Andreas Scherer, Bin Zhu, Yongmei Zhao, Virginie Petitjean, Jyoti Shetty, Huixiao Hong, Jessica Nordlund, Ulrika Liljedahl, Li Tai Fang, and Institute for Molecular Medicine Finland
- Subjects
Statistics and Probability ,Data Descriptor ,Computer science ,SAMPLES ,Science ,3122 Cancers ,Genomics ,Context (language use) ,Computational biology ,Library and Information Sciences ,Genome ,DNA sequencing ,Education ,03 medical and health sciences ,0302 clinical medicine ,CANCER MUTATION DETECTION ,Cell Line, Tumor ,Neoplasms ,Exome Sequencing ,Humans ,Precision Medicine ,Exome sequencing ,030304 developmental biology ,Medicinsk genetik ,0303 health sciences ,Whole Genome Sequencing ,business.industry ,Genome, Human ,Computational Biology ,Benchmarking ,DNA ,Personalized medicine ,Standardization ,3. Good health ,Computer Science Applications ,Data processing ,Reference data ,3111 Biomedicine ,Statistics, Probability and Uncertainty ,business ,Medical Genetics ,030217 neurology & neurosurgery ,Information Systems - Abstract
With the rapid advancement of sequencing technologies, next generation sequencing (NGS) analysis has been widely applied in cancer genomics research. More recently, NGS has been adopted in clinical oncology to advance personalized medicine. Clinical applications of precision oncology require accurate tests that can distinguish tumor-specific mutations from artifacts introduced during NGS processes or data analysis. Therefore, there is an urgent need to develop best practices in cancer mutation detection using NGS and the need for standard reference data sets for systematically measuring accuracy and reproducibility across platforms and methods. Within the SEQC2 consortium context, we established paired tumor-normal reference samples and generated whole-genome (WGS) and whole-exome sequencing (WES) data using sixteen library protocols, seven sequencing platforms at six different centers. We systematically interrogated somatic mutations in the reference samples to identify factors affecting detection reproducibility and accuracy in cancer genomes. These large cross-platform/site WGS and WES datasets using well-characterized reference samples will represent a powerful resource for benchmarking NGS technologies, bioinformatics pipelines, and for the cancer genomics studies., Measurement(s)Somatic Mutation AnalysisTechnology Type(s)whole genome sequencing • Whole Exome SequencingFactor Type(s)sequencing platform • sample prepration • library preparation • bioinformatics methodSample Characteristic - OrganismHomo sapiens Machine-accessible metadata file describing the reported data: 10.6084/m9.figshare.16713655
- Published
- 2021
83. Applicability Domains Enhance Application of PPARγ Agonist Classifiers Trained by Drug-like Compounds to Environmental Chemicals
- Author
-
Huixiao Hong, Zhongyu Wang, and Jingwen Chen
- Subjects
Agonist ,Drug ,Quantitative structure–activity relationship ,medicine.drug_class ,Computer science ,In silico ,media_common.quotation_subject ,Quantitative Structure-Activity Relationship ,Computational biology ,010501 environmental sciences ,Toxicology ,01 natural sciences ,Pparγ agonist ,Cell Line ,03 medical and health sciences ,External data ,Cricetulus ,Chlorocebus aethiops ,medicine ,Animals ,Humans ,030304 developmental biology ,0105 earth and related environmental sciences ,media_common ,0303 health sciences ,Peroxisome proliferator ,General Medicine ,PPAR gamma ,Biological Assay ,Environmental Pollutants - Abstract
Peroxisome proliferator activator receptor gamma (PPARγ) agonist activity of chemicals is an indicator of concerned health conditions such as fatty liver and obesity. In silico screening PPARγ agonists based on quantitative structure-activity relationship (QSAR) models could serve as an efficient and pragmatic strategy. Owing to the broad research interests in discovery of PPARγ-targeted drugs, a large amount of PPARγ agonist activity data has been produced in the field of medicinal chemistry, facilitating development of robust QSAR models. In this study, random forest classifiers were developed based on the binary-category data transformed from the heterogeneous PPARγ agonist activity data of drug-like compounds. Coupling with applicability domains, capability of the established classifiers for predicting environmental chemicals was evaluated using two external data sets. Our results demonstrated that applicability domains could enhance application of the developed classifiers to predict environmental PPARγ agonists.
- Published
- 2020
- Full Text
- View/download PDF
84. Elucidation of Agonist and Antagonist Dynamic Binding Patterns in ER-α by Integration of Molecular Docking, Molecular Dynamics Simulations and Quantum Mechanical Calculations
- Author
-
Weigong Ge, Wenjing Guo, Tucker A. Patterson, Jie Liu, Huixiao Hong, Sugunadevi Sakkiah, and Chandrabose Selvaraj
- Subjects
Agonist ,medicine.drug_class ,QH301-705.5 ,Estrogen receptor ,quantum mechanical calculations ,Molecular Dynamics Simulation ,Ligands ,Catalysis ,Article ,Inorganic Chemistry ,Molecular dynamics ,medicine ,Humans ,Physical and Theoretical Chemistry ,Binding site ,Biology (General) ,Molecular Biology ,Conformational isomerism ,QD1-999 ,Spectroscopy ,Estradiol ,Chemistry ,Organic Chemistry ,Antagonist ,Estrogen Receptor alpha ,Hydrogen Bonding ,General Medicine ,molecular docking ,molecular dynamics simulations ,Computer Science Applications ,Molecular Docking Simulation ,Tamoxifen ,Nuclear receptor ,dynamic binding pattern ,Biophysics ,Quantum Theory ,Estrogen receptor alpha ,Hydrophobic and Hydrophilic Interactions ,estrogen receptor - Abstract
Estrogen receptor alpha (ERα) is a ligand-dependent transcriptional factor in the nuclear receptor superfamily. Many structures of ERα bound with agonists and antagonists have been determined. However, the dynamic binding patterns of agonists and antagonists in the binding site of ERα remains unclear. Therefore, we performed molecular docking, molecular dynamics (MD) simulations, and quantum mechanical calculations to elucidate agonist and antagonist dynamic binding patterns in ERα. 17β-estradiol (E2) and 4-hydroxytamoxifen (OHT) were docked in the ligand binding pockets of the agonist and antagonist bound ERα. The best complex conformations from molecular docking were subjected to 100 nanosecond MD simulations. Hierarchical clustering was conducted to group the structures in the trajectory from MD simulations. The representative structure from each cluster was selected to calculate the binding interaction energy value for elucidation of the dynamic binding patterns of agonists and antagonists in the binding site of ERα. The binding interaction energy analysis revealed that OHT binds ERα more tightly in the antagonist conformer, while E2 prefers the agonist conformer. The results may help identify ERα antagonists as drug candidates and facilitate risk assessment of chemicals through ER-mediated responses.
- Published
- 2021
85. Predictive Models to Identify Small Molecule Activators and Inhibitors of Opioid Receptors
- Author
-
Ruili Huang, Srilatha Sakamuru, Menghang Xia, Iosif I. Vaisman, Jinghua Zhao, Huixiao Hong, and Anton Simeonov
- Subjects
General Chemical Engineering ,High-throughput screening ,Pain ,Computational biology ,Library and Information Sciences ,01 natural sciences ,Article ,0103 physical sciences ,medicine ,Humans ,Receptor ,Opioid addiction ,Analgesics ,Training set ,010304 chemical physics ,Chemistry ,General Chemistry ,Experimental validation ,Small molecule ,0104 chemical sciences ,Computer Science Applications ,Analgesics, Opioid ,Molecular Docking Simulation ,010404 medicinal & biomolecular chemistry ,Opioid ,Receptors, Opioid ,Hit rate ,medicine.drug - Abstract
Opioid receptors (OPRs) are the main targets for the treatment of pain and related disorders. The opiate compounds that activate these receptors are effective analgesics but their use leads to adverse effects and they often are highly addictive drugs of abuse. There is an urgent need for alternative chemicals that are analgesic and reducing/avoiding the unwanted effects in order to relieve the public health crisis of opioid addiction. Here, we aim to develop computational models to predict the OPR activity of small molecule compounds based on chemical structures and apply these models to identify novel OPR active compounds. We used four different machine learning algorithms to build models based on quantitative high throughput screening (qHTS) datasets of three OPRs in both agonist and antagonist modes. The best performing models were applied to virtually screen a large collection of compounds. The model predicted active compounds which were experimentally validated using the same qHTS assays that generated the training data. Random forest was the best classifier with the highest performance metrics and the mu OPR (OPRM)-agonist model achieved the best performance with AUC-ROC (0.88) and MCC (0.7) values. The model predicted actives resulted in hit rates ranging from 2.3% (delta OPR-agonist) to 15.8% (OPRM-agonist) after experimental confirmation. Comparing to the original assay hit rate, all models enriched hit rate by ≥ 2-fold. Our approach produced robust OPR prediction models that can be applied to prioritize compounds identified from large libraries for further experimental validation. The models identified several novel potent compounds as activators/inhibitors of OPRs that were confirmed experimentally. The potent hits were further investigated using molecular docking to find the interactions of the novel ligands in the active site of the corresponding OPR.
- Published
- 2021
86. Developing QSAR Models with Defined Applicability Domains on PPARγ Binding Affinity Using Large Data Sets and Machine Learning Algorithms
- Author
-
Jingwen Chen, Huixiao Hong, and Zhongyu Wang
- Subjects
Quantitative structure–activity relationship ,Computer science ,Quantitative Structure-Activity Relationship ,Computational toxicology ,010501 environmental sciences ,Machine learning ,computer.software_genre ,01 natural sciences ,Machine Learning ,Human health ,Similarity (network science) ,Environmental Chemistry ,Humans ,0105 earth and related environmental sciences ,business.industry ,Regression analysis ,General Chemistry ,PPAR gamma ,Pairwise comparison ,Artificial intelligence ,business ,Algorithm ,computer ,Algorithms ,Applicability domain ,Protein Binding - Abstract
Chemicals may cause adverse effects on human health through binding to peroxisome proliferator-activated receptor γ (PPARγ). Hence, binding affinity is useful for evaluating chemicals with potential endocrine-disrupting effects. Quantitative structure-activity relationship (QSAR) regression models with defined applicability domains (ADs) are important to enable efficient screening of chemicals with PPARγ binding activity. However, lack of large data sets hindered the development of QSAR models. In this study, based on PPARγ binding affinity data sets curated from various sources, 30 QSAR models were developed using molecular fingerprints, two-dimensional descriptors, and five machine learning algorithms. Structure-activity landscapes (SALs) of the training compounds were described by network-like similarity graphs (NSGs). Based on the NSGs, local discontinuity scores were calculated and found to be positively correlated with the cross-validation absolute prediction errors of the models using the different training sets, descriptors, and algorithms. Moreover, innovative ADs were defined based on pairwise similarities between compounds and were found to outperform some conventional ADs. The curated data sets and developed regression models could be useful for evaluating PPARγ-involved adverse effects of chemicals. The SAL analysis and the innovative ADs could facilitate understanding of prediction results from QSAR models.
- Published
- 2021
87. Identification of Epidemiological Traits by Analysis of SARS−CoV−2 Sequences
- Author
-
Wenjing Guo, Jie Liu, Huixiao Hong, Bohu Pan, Tucker A. Patterson, Zuowei Ji, and Sugunadevi Sakkiah
- Subjects
0301 basic medicine ,Sequence analysis ,viruses ,Genome, Viral ,Biology ,pattern ,Genome ,Microbiology ,Article ,Virus ,law.invention ,03 medical and health sciences ,0302 clinical medicine ,Phylogenetics ,law ,Virology ,Pandemic ,Humans ,030212 general & internal medicine ,skin and connective tissue diseases ,Pandemics ,genome ,Phylogeny ,Base Sequence ,Phylogenetic tree ,SARS-CoV-2 ,SARS−CoV−2 ,phylogenetic analysis ,fungi ,virus diseases ,COVID-19 ,sequence ,QR1-502 ,Hierarchical clustering ,body regions ,030104 developmental biology ,Infectious Diseases ,Transmission (mechanics) ,Evolutionary biology ,epidemiological trait ,Databases, Nucleic Acid ,Sequence Analysis - Abstract
Severe acute respiratory syndrome coronavirus 2 (SARS−CoV−2) has caused the ongoing global COVID-19 pandemic that began in late December 2019. The rapid spread of SARS−CoV−2 is primarily due to person-to-person transmission. To understand the epidemiological traits of SARS−CoV−2 transmission, we conducted phylogenetic analysis on genome sequences from >, 54K SARS−CoV−2 cases obtained from two public databases. Hierarchical clustering analysis on geographic patterns in the resulting phylogenetic trees revealed a co-expansion tendency of the virus among neighboring countries with diverse sources and transmission routes for SARS−CoV−2. Pairwise sequence similarity analysis demonstrated that SARS−CoV−2 is transmitted locally and evolves during transmission. However, no significant differences were seen among SARS−CoV−2 genomes grouped by host age or sex. Here, our identified epidemiological traits provide information to better prevent transmission of SARS−CoV−2 and to facilitate the development of effective vaccines and therapeutics against the virus.
- Published
- 2021
88. Text Fingerprinting and Topic Mining in the Prescription Opioid Use Literature
- Author
-
Henry Francis, Junxiu Zhou, Weida Tong, Roger Perkins, Wen Zou, Beverly Lyn-Cook, Huixiao Hong, Huyen Le, Weizhong Zhao, and Weigong Ge
- Subjects
Text mining ,Computer science ,business.industry ,Prescription opioid ,Topic mining ,business ,Data science - Abstract
Background Prescription opioids are powerful pain-reducing medications, but they may cause a variety of adverse effects. Long-term prescription opioid use (POU) is contributing to an opioid-related epidemic of addiction and death, and the scope of the opioid crisis continues to expand. As such, there is a need to identify the adverse effects associated with prescription opioid use (POU). Thousands of articles that focus on POU and its associated medical disorders have been published. However, it is time-consuming and labor-intensive to extract and understand the information of all POU-related published articles.Methods In this study, we applied the well-adapted topic modeling method, Latent Dirichlet Allocation (LDA), to perform text mining on POU-related literature. We compiled six large academic abstract datasets by searching PubMed using the Medical Subject Headings (MeSH): prescription opioid, codeine, morphine, hydrocodone, oxycodone, and methadone. We then applied topic modeling to identify topics and analyze topic similarities/differences in these six datasets. Word clouds and histograms were used to depict the distribution of vocabularies over each topic in which the most prevalent words conveyed a topic’s substance.Results The LDA topics recaptured the search keywords in PubMed, and further revealed relevant themes, such as patients, drugs, side effects, and association links between different POU and risk factors, such as gender and age. Moreover, based on the topic modeling results, TreeMap was used to fingerprint abstracts, which revealed the possibility of constructing a visualized literature index by combining topic modeling and visualization tools such as TreeMap. Meanwhile, while performing trend analysis to explore the prevalent topic dynamics in the POU-related literature, we found that an increasing trend in opioid prescription and its associated health risks are assessed as the most central issues.Conclusion The topic modeling results presented in this study not only convey an understandable and thematic structure of the POU literature, but also provide a means to discover which documents contain information about medical disorders associated with POU, thus, reducing the time and effort needed to review the literature for relevant articles. These results can be used as a preliminary study to systematically understand the risk factors related to increased POU-associated medical disorders.
- Published
- 2021
- Full Text
- View/download PDF
89. ESSESA: An expert system for structure elucidation from spectra, 6. Substructure constraints from analysis of 13C-NMR spectra.
- Author
-
Huixiao Hong, Han Yinling, Xin Xinquan, and Shi Yufeng
- Published
- 1995
- Full Text
- View/download PDF
90. Whole Exome Sequencing Reveals Genetic Variants in HLA Class II Genes Associated With Transplant-free Survival of Indeterminate Acute Liver Failure
- Author
-
Tsung-Jen, Liao, Bohu, Pan, Huixiao, Hong, Paul, Hayashi, Jody A, Rule, Daniel, Ganger, William M, Lee, Jorge, Rakela, and Minjun, Chen
- Subjects
End Stage Liver Disease ,HLA-DRB5 Chains ,Genes, MHC Class II ,Sodium ,Exome Sequencing ,Gastroenterology ,Humans ,Liver Failure, Acute ,Severity of Illness Index - Abstract
Indeterminate acute liver failure (IND-ALF) is a rare clinical syndrome with a high mortality rate. Lacking a known etiology makes rapid evaluation and treatment difficult, with liver transplantation often considered as the only therapeutic option. Our aim was to identify genetic variants from whole exome sequencing data that might be associated with IND-ALF clinical outcomes.Bioinformatics analysis was performed on whole exome sequencing data for 22 patients with IND-ALF. A 2-tier approach was used to identify significant single-nucleotide polymorphisms (SNPs) associated with IND-ALF clinical outcomes. Tier 1 identified the SNPs with a higher relative risk in the IND-ALF population compared with those identified in control populations. Tier 2 determined the SNPs connected to transplant-free survival and associated with model for end-stage liver disease serum sodium and Acute Liver Failure Study Group prognostic scores.Thirty-one SNPs were found associated with a higher relative risk in the IND-ALF population compared with those in controls, of which 11 belong to the human leukocyte antigen (HLA) class II genes but none for the class I. Further analysis showed that 5 SNPs: rs796202376, rs139189937, and rs113473719 of HLA-DRB5; rs9272712 of HLA-DQA1; and rs747397929 of IDO1 were associated with a higher probability of IND-ALF transplant-free survival. Using 3 selected SNPs, a model for the polygenic risk score was developed to predict IND-ALF prognoses, which are comparable with those by model for end-stage liver disease serum sodium and Acute Liver Failure Study Group prognostic scores.Certain gene variants in HLA-DRB5, HLA-DQA1, and IDO1 were found associated with IND-ALF transplant-free survival. Once validated, these identified SNPs may help elucidate the mechanism of IND-ALF and assist in its diagnosis and management.
- Published
- 2022
- Full Text
- View/download PDF
91. ESSESA: An Expert System for Structure Elucidation from Spectra. 4. Canonical Representation of Structures.
- Author
-
Huixiao Hong and Xin Xinquan
- Published
- 1994
- Full Text
- View/download PDF
92. ESSESA: An Expert System for Structure Elucidation from Spectra. 5. Substructure Constraints from Analysis of First-Order 1H-NMR Spectra.
- Author
-
Huixiao Hong and Xin Xinquan
- Published
- 1994
- Full Text
- View/download PDF
93. QSAR in Safety Evaluation and Risk Assessment
- Author
-
Huixiao Hong and Huixiao Hong
- Abstract
QSAR in Safety Evaluation and Risk Assessment provides comprehensive coverage on QSAR methods, tools, data sources, and models focusing on applications in products safety evaluation and chemicals risk assessment. Organized into five parts, the book covers almost all aspects of QSAR modeling and application. Topics in the book include methods of QSAR, from both scientific and regulatory viewpoints; data sources available for facilitating QSAR models development; software tools for QSAR development; and QSAR models developed for assisting safety evaluation and risk assessment. Chapter contributors are authored by a lineup of active scientists in this field. The chapters not only provide professional level technical summarizations but also cover introductory descriptions for all aspects of QSAR for safety evaluation and risk assessment. - Provides comprehensive content about the QSAR techniques and models in facilitating the safety evaluation of drugs and consumer products and risk assesment of environmental chemicals - Includes some of the most cutting-edge methodologies such as deep learning and machine learning for QSAR - Offers detailed procedures of modeling and provides examples of each model's application in real practice
- Published
- 2023
94. Machine Learning and Deep Learning in Computational Toxicology
- Author
-
Huixiao Hong and Huixiao Hong
- Subjects
- Toxicology, Machine learning, Artificial intelligence
- Abstract
This book is a collection of machine learning and deep learning algorithms, methods, architectures, and software tools that have been developed and widely applied in predictive toxicology. It compiles a set of recent applications using state-of-the-art machine learning and deep learning techniques in analysis of a variety of toxicological endpoint data. The contents illustrate those machine learning and deep learning algorithms, methods, and software tools and summarise the applications of machine learning and deep learning in predictive toxicology with informative text, figures, and tables that are contributed by the first tier of experts. One of the major features is the case studies of applications of machine learning and deep learning in toxicological research that serve as examples for readers to learn how to apply machine learning and deep learning techniques in predictive toxicology. This book is expected to provide a reference for practical applications of machine learning anddeep learning in toxicological research. It is a useful guide for toxicologists, chemists, drug discovery and development researchers, regulatory scientists, government reviewers, and graduate students. The main benefit for the readers is understanding the widely used machine learning and deep learning techniques and gaining practical procedures for applying machine learning and deep learning in predictive toxicology.
- Published
- 2023
95. Sustainable Management of Synthetic Chemicals
- Author
-
Hao Zhu, Jingwen Chen, Ruili Huang, and Huixiao Hong
- Subjects
Renewable Energy, Sustainability and the Environment ,Sustainable management ,General Chemical Engineering ,Environmental Chemistry ,General Chemistry ,Business ,Environmental planning - Published
- 2021
- Full Text
- View/download PDF
96. Human transthyretin binding affinity of halogenated thiophenols and halogenated phenols: An in vitro and in silico study
- Author
-
Lianjun Wang, Rebeca Kusko, Jingwen Chen, Huixiao Hong, Xianhai Yang, Wang Ou, Huihui Liu, and Songshan Zhao
- Subjects
inorganic chemicals ,Environmental Engineering ,Molecular model ,Health, Toxicology and Mutagenesis ,In silico ,0208 environmental biotechnology ,Ionic bonding ,02 engineering and technology ,010501 environmental sciences ,01 natural sciences ,Hydrophobic effect ,Molecular recognition ,Phenols ,Environmental Chemistry ,Non-covalent interactions ,Humans ,Prealbumin ,Computer Simulation ,Sulfhydryl Compounds ,0105 earth and related environmental sciences ,chemistry.chemical_classification ,Hydrogen bond ,Public Health, Environmental and Occupational Health ,General Medicine ,General Chemistry ,Ligand (biochemistry) ,Pollution ,Combinatorial chemistry ,020801 environmental engineering ,chemistry - Abstract
Serious harmful effects have been reported for thiophenols, which are widely used industrial materials. To date, little information is available on whether such chemicals can elicit endocrine-related detrimental effects. Herein the potential binding affinity and underlying mechanism of action between human transthyretin (hTTR) and seven halogenated-thiophenols were examined experimentally and computationally. Experimental results indicated that the halogenated-thiophenols, except for pentafluorothiophenol, were powerful hTTR binders. The differentiated hTTR binding affinity of halogenated-thiophenols and halogenated-phenols were observed. The hTTR binding affinity of mono- and di-halo-thiophenols was higher than that of corresponding phenols; while the opposite relationship was observed for tri- and penta-halo-thiophenols and phenols. Our results also confirmed that the binding interactions were influenced by the degree of ligand dissociation. Molecular modeling results implied that the dominant noncovalent interactions in the molecular recognition processes between hTTR and halogenated-thiophenols were ionic pair, hydrogen bonds and hydrophobic interactions. Finally, a model with acceptable predictive ability was developed, which can be used to computationally predict the potential hTTR binding affinity of other halogenated-thiophenols and phenols. Taken together, our results highlighted that more research is needed to determine their potential endocrine-related harmful effects and appropriate management actions should be taken to promote their sustainable use.
- Published
- 2021
97. ESSESA: An expert system for structure elucidation from spectra. 3. LNSCS for chemical knowledge representation.
- Author
-
Huixiao Hong and Xin Xinquan
- Published
- 1992
- Full Text
- View/download PDF
98. Challenges, Solutions, and Quality Metrics of Personal Genome Assembly in Advancing Precision Medicine
- Author
-
Wenming Xiao, Leihong Wu, Gokhan Yavas, Vahan Simonyan, Baitang Ning, and Huixiao Hong
- Subjects
genome ,sequencing ,assembly ,personal genome ,quality metrics ,Pharmacy and materia medica ,RS1-441 - Abstract
Even though each of us shares more than 99% of the DNA sequences in our genome, there are millions of sequence codes or structure in small regions that differ between individuals, giving us different characteristics of appearance or responsiveness to medical treatments. Currently, genetic variants in diseased tissues, such as tumors, are uncovered by exploring the differences between the reference genome and the sequences detected in the diseased tissue. However, the public reference genome was derived with the DNA from multiple individuals. As a result of this, the reference genome is incomplete and may misrepresent the sequence variants of the general population. The more reliable solution is to compare sequences of diseased tissue with its own genome sequence derived from tissue in a normal state. As the price to sequence the human genome has dropped dramatically to around $1000, it shows a promising future of documenting the personal genome for every individual. However, de novo assembly of individual genomes at an affordable cost is still challenging. Thus, till now, only a few human genomes have been fully assembled. In this review, we introduce the history of human genome sequencing and the evolution of sequencing platforms, from Sanger sequencing to emerging “third generation sequencing” technologies. We present the currently available de novo assembly and post-assembly software packages for human genome assembly and their requirements for computational infrastructures. We recommend that a combined hybrid assembly with long and short reads would be a promising way to generate good quality human genome assemblies and specify parameters for the quality assessment of assembly outcomes. We provide a perspective view of the benefit of using personal genomes as references and suggestions for obtaining a quality personal genome. Finally, we discuss the usage of the personal genome in aiding vaccine design and development, monitoring host immune-response, tailoring drug therapy and detecting tumors. We believe the precision medicine would largely benefit from bioinformatics solutions, particularly for personal genome assembly.
- Published
- 2016
- Full Text
- View/download PDF
99. Comprehensive Assessments of RNA-seq by the SEQC Consortium: FDA-Led Efforts Advance Precision Medicine
- Author
-
Joshua Xu, Binsheng Gong, Leihong Wu, Shraddha Thakkar, Huixiao Hong, and Weida Tong
- Subjects
genomics ,RNA-seq ,reproducibility ,big data ,next generation sequencing ,Pharmacy and materia medica ,RS1-441 - Abstract
Studies on gene expression in response to therapy have led to the discovery of pharmacogenomics biomarkers and advances in precision medicine. Whole transcriptome sequencing (RNA-seq) is an emerging tool for profiling gene expression and has received wide adoption in the biomedical research community. However, its value in regulatory decision making requires rigorous assessment and consensus between various stakeholders, including the research community, regulatory agencies, and industry. The FDA-led SEquencing Quality Control (SEQC) consortium has made considerable progress in this direction, and is the subject of this review. Specifically, three RNA-seq platforms (Illumina HiSeq, Life Technologies SOLiD, and Roche 454) were extensively evaluated at multiple sites to assess cross-site and cross-platform reproducibility. The results demonstrated that relative gene expression measurements were consistently comparable across labs and platforms, but not so for the measurement of absolute expression levels. As part of the quality evaluation several studies were included to evaluate the utility of RNA-seq in clinical settings and safety assessment. The neuroblastoma study profiled tumor samples from 498 pediatric neuroblastoma patients by both microarray and RNA-seq. RNA-seq offers more utilities than microarray in determining the transcriptomic characteristics of cancer. However, RNA-seq and microarray-based models were comparable in clinical endpoint prediction, even when including additional features unique to RNA-seq beyond gene expression. The toxicogenomics study compared microarray and RNA-seq profiles of the liver samples from rats exposed to 27 different chemicals representing multiple toxicity modes of action. Cross-platform concordance was dependent on chemical treatment and transcript abundance. Though both RNA-seq and microarray are suitable for developing gene expression based predictive models with comparable prediction performance, RNA-seq offers advantages over microarray in profiling genes with low expression. The rat BodyMap study provided a comprehensive rat transcriptomic body map by performing RNA-Seq on 320 samples from 11 organs in either sex of juvenile, adolescent, adult and aged Fischer 344 rats. Lastly, the transferability study demonstrated that signature genes of predictive models are reciprocally transferable between microarray and RNA-seq data for model development using a comprehensive approach with two large clinical data sets. This result suggests continued usefulness of legacy microarray data in the coming RNA-seq era. In conclusion, the SEQC project enhances our understanding of RNA-seq and provides valuable guidelines for RNA-seq based clinical application and safety evaluation to advance precision medicine.
- Published
- 2016
- Full Text
- View/download PDF
100. Reproducibility challenges for biomarker detection with uncertain but informative experimental data
- Author
-
Wei V. Zhuang, Luísa Camacho, Camila S. Silva, and Huixiao Hong
- Subjects
0301 basic medicine ,Reproducibility ,business.industry ,Biochemistry (medical) ,Clinical Biochemistry ,Liquid Biopsy ,Inference ,Experimental data ,Reproducibility of Results ,Computational biology ,Real-Time Polymerase Chain Reaction ,03 medical and health sciences ,Circulating biomarkers ,MicroRNAs ,030104 developmental biology ,0302 clinical medicine ,030220 oncology & carcinogenesis ,Drug Discovery ,Medicine ,Biomarker (medicine) ,Humans ,Liquid biopsy ,business ,Biomarkers - Abstract
Recent studies have revealed that circulating microRNAs are promising biomarkers for detecting toxicity or disease. Quantitative real-time polymerase chain reaction (qPCR) is often used to measure the levels of microRNAs. Besides complete and certain data, investigators inevitably have observed technically incomplete or uncertain qPCR data. Investigators usually set incomplete observations equal to the maximum quality number of qPCR cycles, apply the complete-observation method, or choose not to analyze targets with incomplete observations. Using biostatistical knowledge and published studies, we show that three commonly applied methods tend to cause biased inference and decrease reproducibility in biomarker detection. More efforts are needed to address the challenges to identify and detect reliable, novel circulating biomarkers in liquid biopsies.
- Published
- 2020
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.