417 results on '"Huixiao Hong"'
Search Results
2. Development of a comprehensive open access 'molecules with androgenic activity resource (MAAR)' to facilitate risk assessment of chemicals
- Author
-
Fan Dong, Barry Hardy, Jie Liu, Tomaz Mohoric, Wenjing Guo, Thomas Exner, Weida Tong, Joh Dohler, Daniel Bachler, and Huixiao Hong
- Subjects
androgen receptor ,risk assessment ,chemicals ,database ,open access ,Biology (General) ,QH301-705.5 ,Medicine - Abstract
The increasing prevalence of endocrine-disrupting chemicals (EDCs) and their potential adverse effects on human health underscore the necessity for robust tools to assess and manage associated risks. The androgen receptor (AR) is a critical component of the endocrine system, playing a pivotal role in mediating the biological effects of androgens, which are male sex hormones. Exposure to androgen-disrupting chemicals during critical periods of development, such as fetal development or puberty, may result in adverse effects on reproductive health, including altered sexual differentiation, impaired fertility, and an increased risk of reproductive disorders. Therefore, androgenic activity data is critical for chemical risk assessment. A large amount of androgenic data has been generated using various experimental protocols. Moreover, the data are reported in different formats and in diverse sources. To facilitate utilization of androgenic activity data in chemical risk assessment, the Molecules with Androgenic Activity Resource (MAAR) was developed. MAAR is the first open-access platform designed to streamline and enhance the risk assessment of chemicals with androgenic activity. MAAR’s development involved the integration of diverse data sources, including data from public databases and mining literature, to establish a reliable and versatile repository. The platform employs a user-friendly interface, enabling efficient navigation and extraction of pertinent information. MAAR is poised to advance chemical risk assessment by offering unprecedented access to information crucial for evaluating the androgenic potential of a wide array of chemicals. The open-access nature of MAAR promotes transparency and collaboration, fostering a collective effort to address the challenges posed by androgenic EDCs.
- Published
- 2024
- Full Text
- View/download PDF
3. Editorial: Big data and artificial intelligence for genomics and therapeutics – Proceedings of the 19th Annual Meeting of the MidSouth Computational Biology and Bioinformatics Society (MCBIOS)
- Author
-
Huixiao Hong, Inimary Toby-Ogundeji, Robert J. Doerksen, and Zhaohui Steve Qin
- Subjects
big data ,bioinformatics ,artificial intelligence ,genomics ,therapeutics ,Computer applications to medicine. Medical informatics ,R858-859.7 - Published
- 2024
- Full Text
- View/download PDF
4. Quartet DNA reference materials and datasets for comprehensively evaluating germline variant calling performance
- Author
-
Luyao Ren, Xiaoke Duan, Lianhua Dong, Rui Zhang, Jingcheng Yang, Yuechen Gao, Rongxue Peng, Wanwan Hou, Yaqing Liu, Jingjing Li, Ying Yu, Naixin Zhang, Jun Shang, Fan Liang, Depeng Wang, Hui Chen, Lele Sun, Lingtong Hao, The Quartet Project Team, Andreas Scherer, Jessica Nordlund, Wenming Xiao, Joshua Xu, Weida Tong, Xin Hu, Peng Jia, Kai Ye, Jinming Li, Li Jin, Huixiao Hong, Jing Wang, Shaohua Fan, Xiang Fang, Yuanting Zheng, and Leming Shi
- Subjects
Biology (General) ,QH301-705.5 ,Genetics ,QH426-470 - Abstract
Abstract Background Genomic DNA reference materials are widely recognized as essential for ensuring data quality in omics research. However, relying solely on reference datasets to evaluate the accuracy of variant calling results is incomplete, as they are limited to benchmark regions. Therefore, it is important to develop DNA reference materials that enable the assessment of variant detection performance across the entire genome. Results We established a DNA reference material suite from four immortalized cell lines derived from a family of parents and monozygotic twins. Comprehensive reference datasets of 4.2 million small variants and 15,000 structural variants were integrated and certified for evaluating the reliability of germline variant calls inside the benchmark regions. Importantly, the genetic built-in-truth of the Quartet family design enables estimation of the precision of variant calls outside the benchmark regions. Using the Quartet reference materials along with study samples, batch effects are objectively monitored and alleviated by training a machine learning model with the Quartet reference datasets to remove potential artifact calls. Moreover, the matched RNA and protein reference materials and datasets from the Quartet project enables cross-omics validation of variant calls from multiomics data. Conclusions The Quartet DNA reference materials and reference datasets provide a unique resource for objectively assessing the quality of germline variant calls throughout the whole-genome regions and improving the reliability of large-scale genomic profiling.
- Published
- 2023
- Full Text
- View/download PDF
5. The Quartet Data Portal: integration of community-wide resources for multiomics quality control
- Author
-
Jingcheng Yang, Yaqing Liu, Jun Shang, Qiaochu Chen, Qingwang Chen, Luyao Ren, Naixin Zhang, Ying Yu, Zhihui Li, Yueqiang Song, Shengpeng Yang, Andreas Scherer, Weida Tong, Huixiao Hong, Wenming Xiao, Leming Shi, and Yuanting Zheng
- Subjects
Quartet Data Portal ,Multiomics ,Quartet Project ,Quality control ,Reference materials ,Reference datasets ,Biology (General) ,QH301-705.5 ,Genetics ,QH426-470 - Abstract
Abstract The Quartet Data Portal facilitates community access to well-characterized reference materials, reference datasets, and related resources established based on a family of four individuals with identical twins from the Quartet Project. Users can request DNA, RNA, protein, and metabolite reference materials, as well as datasets generated across omics, platforms, labs, protocols, and batches. Reproducible analysis tools allow for objective performance assessment of user-submitted data, while interactive visualization tools support rapid exploration of reference datasets. A closed-loop “distribution-collection-evaluation-integration” workflow enables updates and integration of community-contributed multiomics data. Ultimately, this portal helps promote the advancement of reference datasets and multiomics quality control.
- Published
- 2023
- Full Text
- View/download PDF
6. Correcting batch effects in large-scale multiomics studies using a reference-material-based ratio method
- Author
-
Ying Yu, Naixin Zhang, Yuanbang Mai, Luyao Ren, Qiaochu Chen, Zehui Cao, Qingwang Chen, Yaqing Liu, Wanwan Hou, Jingcheng Yang, Huixiao Hong, Joshua Xu, Weida Tong, Lianhua Dong, Leming Shi, Xiang Fang, and Yuanting Zheng
- Subjects
Batch effect ,Ratio ,Reference materials ,Multiomics ,Phenomics ,Differentially expressed ,Biology (General) ,QH301-705.5 ,Genetics ,QH426-470 - Abstract
Abstract Background Batch effects are notoriously common technical variations in multiomics data and may result in misleading outcomes if uncorrected or over-corrected. A plethora of batch-effect correction algorithms are proposed to facilitate data integration. However, their respective advantages and limitations are not adequately assessed in terms of omics types, the performance metrics, and the application scenarios. Results As part of the Quartet Project for quality control and data integration of multiomics profiling, we comprehensively assess the performance of seven batch effect correction algorithms based on different performance metrics of clinical relevance, i.e., the accuracy of identifying differentially expressed features, the robustness of predictive models, and the ability of accurately clustering cross-batch samples into their own donors. The ratio-based method, i.e., by scaling absolute feature values of study samples relative to those of concurrently profiled reference material(s), is found to be much more effective and broadly applicable than others, especially when batch effects are completely confounded with biological factors of study interests. We further provide practical guidelines for implementing the ratio based approach in increasingly large-scale multiomics studies. Conclusions Multiomics measurements are prone to batch effects, which can be effectively corrected using ratio-based scaling of the multiomics data. Our study lays the foundation for eliminating batch effects at a ratio scale.
- Published
- 2023
- Full Text
- View/download PDF
7. BERT-based language model for accurate drug adverse event extraction from social media: implementation, evaluation, and contributions to pharmacovigilance practices
- Author
-
Fan Dong, Wenjing Guo, Jie Liu, Tucker A. Patterson, and Huixiao Hong
- Subjects
pharmacovigilance ,social media ,drug ,adverse event ,language model (LM) ,Public aspects of medicine ,RA1-1270 - Abstract
IntroductionSocial media platforms serve as a valuable resource for users to share health-related information, aiding in the monitoring of adverse events linked to medications and treatments in drug safety surveillance. However, extracting drug-related adverse events accurately and efficiently from social media poses challenges in both natural language processing research and the pharmacovigilance domain.MethodRecognizing the lack of detailed implementation and evaluation of Bidirectional Encoder Representations from Transformers (BERT)-based models for drug adverse event extraction on social media, we developed a BERT-based language model tailored to identifying drug adverse events in this context. Our model utilized publicly available labeled adverse event data from the ADE-Corpus-V2. Constructing the BERT-based model involved optimizing key hyperparameters, such as the number of training epochs, batch size, and learning rate. Through ten hold-out evaluations on ADE-Corpus-V2 data and external social media datasets, our model consistently demonstrated high accuracy in drug adverse event detection.ResultThe hold-out evaluations resulted in average F1 scores of 0.8575, 0.9049, and 0.9813 for detecting words of adverse events, words in adverse events, and words not in adverse events, respectively. External validation using human-labeled adverse event tweets data from SMM4H further substantiated the effectiveness of our model, yielding F1 scores 0.8127, 0.8068, and 0.9790 for detecting words of adverse events, words in adverse events, and words not in adverse events, respectively.DiscussionThis study not only showcases the effectiveness of BERT-based language models in accurately identifying drug-related adverse events in the dynamic landscape of social media data, but also addresses the need for the implementation of a comprehensive study design and evaluation. By doing so, we contribute to the advancement of pharmacovigilance practices and methodologies in the context of emerging information sources like social media.
- Published
- 2024
- Full Text
- View/download PDF
8. Decoding the κ Opioid Receptor (KOR): Advancements in Structural Understanding and Implications for Opioid Analgesic Development
- Author
-
Zoe Li, Ruili Huang, Menghang Xia, Nancy Chang, Wenjing Guo, Jie Liu, Fan Dong, Bailang Liu, Ann Varghese, Aasma Aslam, Tucker A. Patterson, and Huixiao Hong
- Subjects
opioid ,receptor ,structure ,ligand ,binding ,mechanism ,Organic chemistry ,QD241-441 - Abstract
The opioid crisis in the United States is a significant public health issue, with a nearly threefold increase in opioid-related fatalities between 1999 and 2014. In response to this crisis, society has made numerous efforts to mitigate its impact. Recent advancements in understanding the structural intricacies of the κ opioid receptor (KOR) have improved our knowledge of how opioids interact with their receptors, triggering downstream signaling pathways that lead to pain relief. This review concentrates on the KOR, offering crucial structural insights into the binding mechanisms of both agonists and antagonists to the receptor. Through comparative analysis of the atomic details of the binding site, distinct interactions specific to agonists and antagonists have been identified. These insights not only enhance our understanding of ligand binding mechanisms but also shed light on potential pathways for developing new opioid analgesics with an improved risk-benefit profile.
- Published
- 2024
- Full Text
- View/download PDF
9. RxNorm for drug name normalization: a case study of prescription opioids in the FDA adverse events reporting system
- Author
-
Huyen Le, Ru Chen, Stephen Harris, Hong Fang, Beverly Lyn-Cook, Huixiao Hong, Weigong Ge, Paul Rogers, Weida Tong, and Wen Zou
- Subjects
RxNorm ,RxCUI ,FAERS ,drug name normalization ,prescription opioids ,drug safety ,Computer applications to medicine. Medical informatics ,R858-859.7 - Abstract
Numerous studies have been conducted on the US Food and Drug Administration (FDA) Adverse Events Reporting System (FAERS) database to assess post-marketing reporting rates for drug safety review and risk assessment. However, the drug names in the adverse event (AE) reports from FAERS were heterogeneous due to a lack of uniformity of information submitted mandatorily by pharmaceutical companies and voluntarily by patients, healthcare professionals, and the public. Studies using FAERS and other spontaneous reporting AEs database without drug name normalization may encounter incomplete collection of AE reports from non-standard drug names and the accuracies of the results might be impacted. In this study, we demonstrated applicability of RxNorm, developed by the National Library of Medicine, for drug name normalization in FAERS. Using prescription opioids as a case study, we used RxNorm application program interface (API) to map all FDA-approved prescription opioids described in FAERS AE reports to their equivalent RxNorm Concept Unique Identifiers (RxCUIs) and RxNorm names. The different names of the opioids were then extracted, and their usage frequencies were calculated in collection of more than 14.9 million AE reports for 13 FDA-approved prescription opioid classes, reported over 17 years. The results showed that a significant number of different names were consistently used for opioids in FAERS reports, with 2,086 different names (out of 7,892) used at least three times and 842 different names used at least ten times for each of the 92 RxNorm names of FDA-approved opioids. Our method of using RxNorm API mapping was confirmed to be efficient and accurate and capable of reducing the heterogeneity of prescription opioid names significantly in the AE reports in FAERS; meanwhile, it is expected to have a broad application to different sets of drug names from any database where drug names are diverse and unnormalized. It is expected to be able to automatically standardize and link different representations of the same drugs to build an intact and high-quality database for diverse research, particularly postmarketing data analysis in pharmacovigilance initiatives.
- Published
- 2024
- Full Text
- View/download PDF
10. Fingerprinting Interactions between Proteins and Ligands for Facilitating Machine Learning in Drug Discovery
- Author
-
Zoe Li, Ruili Huang, Menghang Xia, Tucker A. Patterson, and Huixiao Hong
- Subjects
molecular fingerprints ,3D structural interaction fingerprints ,machine learning ,drug discovery ,structure–activity relationships ,protein–ligand interactions ,Microbiology ,QR1-502 - Abstract
Molecular recognition is fundamental in biology, underpinning intricate processes through specific protein–ligand interactions. This understanding is pivotal in drug discovery, yet traditional experimental methods face limitations in exploring the vast chemical space. Computational approaches, notably quantitative structure–activity/property relationship analysis, have gained prominence. Molecular fingerprints encode molecular structures and serve as property profiles, which are essential in drug discovery. While two-dimensional (2D) fingerprints are commonly used, three-dimensional (3D) structural interaction fingerprints offer enhanced structural features specific to target proteins. Machine learning models trained on interaction fingerprints enable precise binding prediction. Recent focus has shifted to structure-based predictive modeling, with machine-learning scoring functions excelling due to feature engineering guided by key interactions. Notably, 3D interaction fingerprints are gaining ground due to their robustness. Various structural interaction fingerprints have been developed and used in drug discovery, each with unique capabilities. This review recapitulates the developed structural interaction fingerprints and provides two case studies to illustrate the power of interaction fingerprint-driven machine learning. The first elucidates structure–activity relationships in β2 adrenoceptor ligands, demonstrating the ability to differentiate agonists and antagonists. The second employs a retrosynthesis-based pre-trained molecular representation to predict protein–ligand dissociation rates, offering insights into binding kinetics. Despite remarkable progress, challenges persist in interpreting complex machine learning models built on 3D fingerprints, emphasizing the need for strategies to make predictions interpretable. Binding site plasticity and induced fit effects pose additional complexities. Interaction fingerprints are promising but require continued research to harness their full potential.
- Published
- 2024
- Full Text
- View/download PDF
11. Computational Nanotoxicology Models for Environmental Risk Assessment of Engineered Nanomaterials
- Author
-
Weihao Tang, Xuejiao Zhang, Huixiao Hong, Jingwen Chen, Qing Zhao, and Fengchang Wu
- Subjects
engineered nanomaterials ,computational nanotoxicology ,exposure models ,hazard models ,Chemistry ,QD1-999 - Abstract
Although engineered nanomaterials (ENMs) have tremendous potential to generate technological benefits in numerous sectors, uncertainty on the risks of ENMs for human health and the environment may impede the advancement of novel materials. Traditionally, the risks of ENMs can be evaluated by experimental methods such as environmental field monitoring and animal-based toxicity testing. However, it is time-consuming, expensive, and impractical to evaluate the risk of the increasingly large number of ENMs with the experimental methods. On the contrary, with the advancement of artificial intelligence and machine learning, in silico methods have recently received more attention in the risk assessment of ENMs. This review discusses the key progress of computational nanotoxicology models for assessing the risks of ENMs, including material flow analysis models, multimedia environmental models, physiologically based toxicokinetics models, quantitative nanostructure–activity relationships, and meta-analysis. Several challenges are identified and a perspective is provided regarding how the challenges can be addressed.
- Published
- 2024
- Full Text
- View/download PDF
12. Text Fingerprinting and Topic Mining in the Prescription Opioid Use Literature.
- Author
-
Huyen Le, Junxiu Zhou, Weizhong Zhao, Roger Perkins, Weigong Ge, Beverly Lyn-Cook, Henry Francis, Huixiao Hong, Weida Tong, and Wen Zou
- Published
- 2021
- Full Text
- View/download PDF
13. Discovering Drug-Drug Associations in the FDA Adverse Event Reporting System Database with Data Mining Approaches.
- Author
-
Weizhong Zhao, Huyen Le, James J. Chen, Hesha J. Duggirala, Richard Forshee, Taxiarchis Botsis, Henry Francis, Huixiao Hong, Weida Tong, Yi-Ting Hwang, and Wen Zou
- Published
- 2021
- Full Text
- View/download PDF
14. Assessing reproducibility of inherited variants detected with short-read whole genome sequencing
- Author
-
Bohu Pan, Luyao Ren, Vitor Onuchic, Meijian Guan, Rebecca Kusko, Steve Bruinsma, Len Trigg, Andreas Scherer, Baitang Ning, Chaoyang Zhang, Christine Glidewell-Kenney, Chunlin Xiao, Eric Donaldson, Fritz J. Sedlazeck, Gary Schroth, Gokhan Yavas, Haiying Grunenwald, Haodong Chen, Heather Meinholz, Joe Meehan, Jing Wang, Jingcheng Yang, Jonathan Foox, Jun Shang, Kelci Miclaus, Lianhua Dong, Leming Shi, Marghoob Mohiyuddin, Mehdi Pirooznia, Ping Gong, Rooz Golshani, Russ Wolfinger, Samir Lababidi, Sayed Mohammad Ebrahim Sahraeian, Steve Sherry, Tao Han, Tao Chen, Tieliu Shi, Wanwan Hou, Weigong Ge, Wen Zou, Wenjing Guo, Wenjun Bao, Wenzhong Xiao, Xiaohui Fan, Yoichi Gondo, Ying Yu, Yongmei Zhao, Zhenqiang Su, Zhichao Liu, Weida Tong, Wenming Xiao, Justin M. Zook, Yuanting Zheng, and Huixiao Hong
- Subjects
Biology (General) ,QH301-705.5 ,Genetics ,QH426-470 - Abstract
Abstract Background Reproducible detection of inherited variants with whole genome sequencing (WGS) is vital for the implementation of precision medicine and is a complicated process in which each step affects variant call quality. Systematically assessing reproducibility of inherited variants with WGS and impact of each step in the process is needed for understanding and improving quality of inherited variants from WGS. Results To dissect the impact of factors involved in detection of inherited variants with WGS, we sequence triplicates of eight DNA samples representing two populations on three short-read sequencing platforms using three library kits in six labs and call variants with 56 combinations of aligners and callers. We find that bioinformatics pipelines (callers and aligners) have a larger impact on variant reproducibility than WGS platform or library preparation. Single-nucleotide variants (SNVs), particularly outside difficult-to-map regions, are more reproducible than small insertions and deletions (indels), which are least reproducible when > 5 bp. Increasing sequencing coverage improves indel reproducibility but has limited impact on SNVs above 30×. Conclusions Our findings highlight sources of variability in variant detection and the need for improvement of bioinformatics pipelines in the era of precision medicine with WGS.
- Published
- 2022
- Full Text
- View/download PDF
15. Achieving robust somatic mutation detection with deep learning models derived from reference data sets of a cancer sample
- Author
-
Sayed Mohammad Ebrahim Sahraeian, Li Tai Fang, Konstantinos Karagiannis, Malcolm Moos, Sean Smith, Luis Santana-Quintero, Chunlin Xiao, Michael Colgan, Huixiao Hong, Marghoob Mohiyuddin, and Wenming Xiao
- Subjects
Somatic mutation ,Deep learning ,Convolutional neural networks ,Well-characterized somatic reference samples ,Model training strategies ,Biology (General) ,QH301-705.5 ,Genetics ,QH426-470 - Abstract
Abstract Background Accurate detection of somatic mutations is challenging but critical in understanding cancer formation, progression, and treatment. We recently proposed NeuSomatic, the first deep convolutional neural network-based somatic mutation detection approach, and demonstrated performance advantages on in silico data. Results In this study, we use the first comprehensive and well-characterized somatic reference data sets from the SEQC2 consortium to investigate best practices for using a deep learning framework in cancer mutation detection. Using the high-confidence somatic mutations established for a cancer cell line by the consortium, we identify the best strategy for building robust models on multiple data sets derived from samples representing real scenarios, for example, a model trained on a combination of real and spike-in mutations had the highest average performance. Conclusions The strategy identified in our study achieved high robustness across multiple sequencing technologies for fresh and FFPE DNA input, varying tumor/normal purities, and different coverages, with significant superiority over conventional detection approaches in general, as well as in challenging situations such as low coverage, low variant allele frequency, DNA damage, and difficult genomic regions
- Published
- 2022
- Full Text
- View/download PDF
16. Hidden biases in germline structural variant detection
- Author
-
Michael M. Khayat, Sayed Mohammad Ebrahim Sahraeian, Samantha Zarate, Andrew Carroll, Huixiao Hong, Bohu Pan, Leming Shi, Richard A. Gibbs, Marghoob Mohiyuddin, Yuanting Zheng, and Fritz J. Sedlazeck
- Subjects
Next-generation sequencing ,Structural variations ,Genomic variability ,Biology (General) ,QH301-705.5 ,Genetics ,QH426-470 - Abstract
Abstract Background Genomic structural variations (SV) are important determinants of genotypic and phenotypic changes in many organisms. However, the detection of SV from next-generation sequencing data remains challenging. Results In this study, DNA from a Chinese family quartet is sequenced at three different sequencing centers in triplicate. A total of 288 derivative data sets are generated utilizing different analysis pipelines and compared to identify sources of analytical variability. Mapping methods provide the major contribution to variability, followed by sequencing centers and replicates. Interestingly, SV supported by only one center or replicate often represent true positives with 47.02% and 45.44% overlapping the long-read SV call set, respectively. This is consistent with an overall higher false negative rate for SV calling in centers and replicates compared to mappers (15.72%). Finally, we observe that the SV calling variability also persists in a genotyping approach, indicating the impact of the underlying sequencing and preparation approaches. Conclusions This study provides the first detailed insights into the sources of variability in SV identification from next-generation sequencing and highlights remaining challenges in SV calling for large cohorts. We further give recommendations on how to reduce SV calling variability and the choice of alignment methodology.
- Published
- 2021
- Full Text
- View/download PDF
17. The SEQC2 epigenomics quality control (EpiQC) study
- Author
-
Jonathan Foox, Jessica Nordlund, Claudia Lalancette, Ting Gong, Michelle Lacey, Samantha Lent, Bradley W. Langhorst, V. K. Chaithanya Ponnaluri, Louise Williams, Karthik Ramaswamy Padmanabhan, Raymond Cavalcante, Anders Lundmark, Daniel Butler, Christopher Mozsary, Justin Gurvitch, John M. Greally, Masako Suzuki, Mark Menor, Masaki Nasu, Alicia Alonso, Caroline Sheridan, Andreas Scherer, Stephen Bruinsma, Gosia Golda, Agata Muszynska, Paweł P. Łabaj, Matthew A. Campbell, Frank Wos, Amanda Raine, Ulrika Liljedahl, Tomas Axelsson, Charles Wang, Zhong Chen, Zhaowei Yang, Jing Li, Xiaopeng Yang, Hongwei Wang, Ari Melnick, Shang Guo, Alexander Blume, Vedran Franke, Inmaculada Ibanez de Caceres, Carlos Rodriguez-Antolin, Rocio Rosas, Justin Wade Davis, Jennifer Ishii, Dalila B. Megherbi, Wenming Xiao, Will Liao, Joshua Xu, Huixiao Hong, Baitang Ning, Weida Tong, Altuna Akalin, Yunliang Wang, Youping Deng, and Christopher E. Mason
- Subjects
Biology (General) ,QH301-705.5 ,Genetics ,QH426-470 - Abstract
Abstract Background Cytosine modifications in DNA such as 5-methylcytosine (5mC) underlie a broad range of developmental processes, maintain cellular lineage specification, and can define or stratify types of cancer and other diseases. However, the wide variety of approaches available to interrogate these modifications has created a need for harmonized materials, methods, and rigorous benchmarking to improve genome-wide methylome sequencing applications in clinical and basic research. Here, we present a multi-platform assessment and cross-validated resource for epigenetics research from the FDA’s Epigenomics Quality Control Group. Results Each sample is processed in multiple replicates by three whole-genome bisulfite sequencing (WGBS) protocols (TruSeq DNA methylation, Accel-NGS MethylSeq, and SPLAT), oxidative bisulfite sequencing (TrueMethyl), enzymatic deamination method (EMSeq), targeted methylation sequencing (Illumina Methyl Capture EPIC), single-molecule long-read nanopore sequencing from Oxford Nanopore Technologies, and 850k Illumina methylation arrays. After rigorous quality assessment and comparison to Illumina EPIC methylation microarrays and testing on a range of algorithms (Bismark, BitmapperBS, bwa-meth, and BitMapperBS), we find overall high concordance between assays, but also differences in efficiency of read mapping, CpG capture, coverage, and platform performance, and variable performance across 26 microarray normalization algorithms. Conclusions The data provided herein can guide the use of these DNA reference materials in epigenomics research, as well as provide best practices for experimental design in future studies. By leveraging seven human cell lines that are designated as publicly available reference materials, these data can be used as a baseline to advance epigenomics research.
- Published
- 2021
- Full Text
- View/download PDF
18. Editorial: Cell signaling status alteration in development and disease
- Author
-
Jun Wu, Haipeng Liu, Xiaodong Zhao, Huixiao Hong, and Johannes Werner
- Subjects
signal pathway ,CGAS ,STING ,multi-omic analyses ,Wnt signaling ,Biology (General) ,QH301-705.5 - Published
- 2022
- Full Text
- View/download PDF
19. Deep Learning Methods for Omics Data Imputation
- Author
-
Lei Huang, Meng Song, Hui Shen, Huixiao Hong, Ping Gong, Hong-Wen Deng, and Chaoyang Zhang
- Subjects
omics imputation ,deep learning ,multi-omics imputation ,Biology (General) ,QH301-705.5 - Abstract
One common problem in omics data analysis is missing values, which can arise due to various reasons, such as poor tissue quality and insufficient sample volumes. Instead of discarding missing values and related data, imputation approaches offer an alternative means of handling missing data. However, the imputation of missing omics data is a non-trivial task. Difficulties mainly come from high dimensionality, non-linear or non-monotonic relationships within features, technical variations introduced by sampling methods, sample heterogeneity, and the non-random missingness mechanism. Several advanced imputation methods, including deep learning-based methods, have been proposed to address these challenges. Due to its capability of modeling complex patterns and relationships in large and high-dimensional datasets, many researchers have adopted deep learning models to impute missing omics data. This review provides a comprehensive overview of the currently available deep learning-based methods for omics imputation from the perspective of deep generative model architectures such as autoencoder, variational autoencoder, generative adversarial networks, and Transformer, with an emphasis on multi-omics data imputation. In addition, this review also discusses the opportunities that deep learning brings and the challenges that it might face in this field.
- Published
- 2023
- Full Text
- View/download PDF
20. Whole genome and exome sequencing reference datasets from a multi-center and cross-platform benchmark study
- Author
-
Yongmei Zhao, Li Tai Fang, Tsai-wei Shen, Sulbha Choudhari, Keyur Talsania, Xiongfong Chen, Jyoti Shetty, Yuliya Kriga, Bao Tran, Bin Zhu, Zhong Chen, Wanqiu Chen, Charles Wang, Erich Jaeger, Daoud Meerzaman, Charles Lu, Kenneth Idler, Luyao Ren, Yuanting Zheng, Leming Shi, Virginie Petitjean, Marc Sultan, Tiffany Hung, Eric Peters, Jiri Drabek, Petr Vojta, Roberta Maestro, Daniela Gasparotto, Sulev Kõks, Ene Reimann, Andreas Scherer, Jessica Nordlund, Ulrika Liljedahl, Jonathan Foox, Christopher E. Mason, Chunlin Xiao, Huixiao Hong, and Wenming Xiao
- Subjects
Science - Abstract
Measurement(s) Somatic Mutation Analysis Technology Type(s) whole genome sequencing • Whole Exome Sequencing Factor Type(s) sequencing platform • sample prepration • library preparation • bioinformatics method Sample Characteristic - Organism Homo sapiens Machine-accessible metadata file describing the reported data: https://doi.org/10.6084/m9.figshare.16713655
- Published
- 2021
- Full Text
- View/download PDF
21. An autoencoder-based deep learning method for genotype imputation
- Author
-
Meng Song, Jonathan Greenbaum, Joseph Luttrell, Weihua Zhou, Chong Wu, Zhe Luo, Chuan Qiu, Lan Juan Zhao, Kuan-Jui Su, Qing Tian, Hui Shen, Huixiao Hong, Ping Gong, Xinghua Shi, Hong-Wen Deng, and Chaoyang Zhang
- Subjects
genotype imputation ,deep learning ,autoencoder ,paired sample t-test ,GWAS ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
Genotype imputation has a wide range of applications in genome-wide association study (GWAS), including increasing the statistical power of association tests, discovering trait-associated loci in meta-analyses, and prioritizing causal variants with fine-mapping. In recent years, deep learning (DL) based methods, such as sparse convolutional denoising autoencoder (SCDA), have been developed for genotype imputation. However, it remains a challenging task to optimize the learning process in DL-based methods to achieve high imputation accuracy. To address this challenge, we have developed a convolutional autoencoder (AE) model for genotype imputation and implemented a customized training loop by modifying the training process with a single batch loss rather than the average loss over batches. This modified AE imputation model was evaluated using a yeast dataset, the human leukocyte antigen (HLA) data from the 1,000 Genomes Project (1KGP), and our in-house genotype data from the Louisiana Osteoporosis Study (LOS). Our modified AE imputation model has achieved comparable or better performance than the existing SCDA model in terms of evaluation metrics such as the concordance rate (CR), the Hellinger score, the scaled Euclidean norm (SEN) score, and the imputation quality score (IQS) in all three datasets. Taking the imputation results from the HLA data as an example, the AE model achieved an average CR of 0.9468 and 0.9459, Hellinger score of 0.9765 and 0.9518, SEN score of 0.9977 and 0.9953, and IQS of 0.9515 and 0.9044 at missing ratios of 10% and 20%, respectively. As for the results of LOS data, it achieved an average CR of 0.9005, Hellinger score of 0.9384, SEN score of 0.9940, and IQS of 0.8681 at the missing ratio of 20%. In summary, our proposed method for genotype imputation has a great potential to increase the statistical power of GWAS and improve downstream post-GWAS analyses.
- Published
- 2022
- Full Text
- View/download PDF
22. Machine learning models for rat multigeneration reproductive toxicity prediction
- Author
-
Jie Liu, Wenjing Guo, Fan Dong, Jason Aungst, Suzanne Fitzpatrick, Tucker A. Patterson, and Huixiao Hong
- Subjects
multigeneration reproductive toxicity ,machine learning ,molecular descriptor ,consensus model ,toxicity prediction ,Therapeutics. Pharmacology ,RM1-950 - Abstract
Reproductive toxicity is one of the prominent endpoints in the risk assessment of environmental and industrial chemicals. Due to the complexity of the reproductive system, traditional reproductive toxicity testing in animals, especially guideline multigeneration reproductive toxicity studies, take a long time and are expensive. Therefore, machine learning, as a promising alternative approach, should be considered when evaluating the reproductive toxicity of chemicals. We curated rat multigeneration reproductive toxicity testing data of 275 chemicals from ToxRefDB (Toxicity Reference Database) and developed predictive models using seven machine learning algorithms (decision tree, decision forest, random forest, k-nearest neighbors, support vector machine, linear discriminant analysis, and logistic regression). A consensus model was built based on the seven individual models. An external validation set was curated from the COSMOS database and the literature. The performances of individual and consensus models were evaluated using 500 iterations of 5-fold cross-validations and the external validation data set. The balanced accuracy of the models ranged from 58% to 65% in the 5-fold cross-validations and 45%–61% in the external validations. Prediction confidence analysis was conducted to provide additional information for more appropriate applications of the developed models. The impact of our findings is in increasing confidence in machine learning models. We demonstrate the importance of using consensus models for harnessing the benefits of multiple machine learning models (i.e., using redundant systems to check validity of outcomes). While we continue to build upon the models to better characterize weak toxicants, there is current utility in saving resources by being able to screen out strong reproductive toxicants before investing in vivo testing. The modeling approach (machine learning models) is offered for assessing the rat multigeneration reproductive toxicity of chemicals. Our results suggest that machine learning may be a promising alternative approach to evaluate the potential reproductive toxicity of chemicals.
- Published
- 2022
- Full Text
- View/download PDF
23. Informing selection of drugs for COVID-19 treatment through adverse events analysis
- Author
-
Wenjing Guo, Bohu Pan, Sugunadevi Sakkiah, Zuowei Ji, Gokhan Yavas, Yanhui Lu, Takashi E. Komatsu, Madhu Lal-Nag, Weida Tong, Tucker A. Patterson, and Huixiao Hong
- Subjects
Medicine ,Science - Abstract
Abstract Coronavirus disease 2019 (COVID-19) is an ongoing pandemic and there is an urgent need for safe and effective drugs for COVID-19 treatment. Since developing a new drug is time consuming, many approved or investigational drugs have been repurposed for COVID-19 treatment in clinical trials. Therefore, selection of safe drugs for COVID-19 patients is vital for combating this pandemic. Our goal was to evaluate the safety concerns of drugs by analyzing adverse events reported in post-market surveillance. We collected 296 drugs that have been evaluated in clinical trials for COVID-19 and identified 28,597,464 associated adverse events at the system organ classes (SOCs) level in the FDA adverse events report systems (FAERS). We calculated Z-scores of SOCs that statistically quantify the relative frequency of adverse events of drugs in FAERS to quantitatively measure safety concerns for the drugs. Analyzing the Z-scores revealed that these drugs are associated with different significantly frequent adverse events. Our results suggest that this safety concern metric may serve as a tool to inform selection of drugs with favorable safety profiles for COVID-19 patients in clinical practices. Caution is advised when administering drugs with high Z-scores to patients who are vulnerable to associated adverse events.
- Published
- 2021
- Full Text
- View/download PDF
24. Predictive Models to Identify Small Molecule Activators and Inhibitors of Opioid Receptors.
- Author
-
Srilatha Sakamuru, Jinghua Zhao, Menghang Xia, Huixiao Hong, Anton Simeonov, Iosif I. Vaisman, and Ruili Huang
- Published
- 2021
- Full Text
- View/download PDF
25. Cross-oncopanel study reveals high sensitivity and accuracy with overall analytical performance depending on genomic regions
- Author
-
Binsheng Gong, Dan Li, Rebecca Kusko, Natalia Novoradovskaya, Yifan Zhang, Shangzi Wang, Carlos Pabón-Peña, Zhihong Zhang, Kevin Lai, Wanshi Cai, Jennifer S. LoCoco, Eric Lader, Todd A. Richmond, Vinay K. Mittal, Liang-Chun Liu, Donald J. Johann, James C. Willey, Pierre R. Bushel, Ying Yu, Chang Xu, Guangchun Chen, Daniel Burgess, Simon Cawley, Kristina Giorda, Nathan Haseley, Fujun Qiu, Katherine Wilkins, Hanane Arib, Claire Attwooll, Kevin Babson, Longlong Bao, Wenjun Bao, Anne Bergstrom Lucas, Hunter Best, Ambica Bhandari, Halil Bisgin, James Blackburn, Thomas M. Blomquist, Lisa Boardman, Blake Burgher, Daniel J. Butler, Chia-Jung Chang, Alka Chaubey, Tao Chen, Marco Chierici, Christopher R. Chin, Devin Close, Jeffrey Conroy, Jessica Cooley Coleman, Daniel J. Craig, Erin Crawford, Angela del Pozo, Ira W. Deveson, Daniel Duncan, Agda Karina Eterovic, Xiaohui Fan, Jonathan Foox, Cesare Furlanello, Abhisek Ghosal, Sean Glenn, Meijian Guan, Christine Haag, Xinyi Hang, Scott Happe, Brittany Hennigan, Jennifer Hipp, Huixiao Hong, Kyle Horvath, Jianhong Hu, Li-Yuan Hung, Mirna Jarosz, Jennifer Kerkhof, Benjamin Kipp, David Philip Kreil, Paweł Łabaj, Pablo Lapunzina, Peng Li, Quan-Zhen Li, Weihua Li, Zhiguang Li, Yu Liang, Shaoqing Liu, Zhichao Liu, Charles Ma, Narasimha Marella, Rubén Martín-Arenas, Dalila B. Megherbi, Qingchang Meng, Piotr A. Mieczkowski, Tom Morrison, Donna Muzny, Baitang Ning, Barbara L. Parsons, Cloud P. Paweletz, Mehdi Pirooznia, Wubin Qu, Amelia Raymond, Paul Rindler, Rebecca Ringler, Bekim Sadikovic, Andreas Scherer, Egbert Schulze, Robert Sebra, Rita Shaknovich, Qiang Shi, Tieliu Shi, Juan Carlos Silla-Castro, Melissa Smith, Mario Solís López, Ping Song, Daniel Stetson, Maya Strahl, Alan Stuart, Julianna Supplee, Philippe Szankasi, Haowen Tan, Lin-ya Tang, Yonghui Tao, Shraddha Thakkar, Danielle Thierry-Mieg, Jean Thierry-Mieg, Venkat J. Thodima, David Thomas, Boris Tichý, Nikola Tom, Elena Vallespin Garcia, Suman Verma, Kimbley Walker, Charles Wang, Junwen Wang, Yexun Wang, Zhining Wen, Valtteri Wirta, Leihong Wu, Chunlin Xiao, Wenzhong Xiao, Shibei Xu, Mary Yang, Jianming Ying, Shun H. Yip, Guangliang Zhang, Sa Zhang, Meiru Zhao, Yuanting Zheng, Xiaoyan Zhou, Christopher E. Mason, Timothy Mercer, Weida Tong, Leming Shi, Wendell Jones, and Joshua Xu
- Subjects
Oncopanel sequencing ,Target enrichment ,Molecular diagnostics ,Reproducibility ,Analytical performance ,Precision medicine ,Biology (General) ,QH301-705.5 ,Genetics ,QH426-470 - Abstract
Abstract Background Targeted sequencing using oncopanels requires comprehensive assessments of accuracy and detection sensitivity to ensure analytical validity. By employing reference materials characterized by the U.S. Food and Drug Administration-led SEquence Quality Control project phase2 (SEQC2) effort, we perform a cross-platform multi-lab evaluation of eight Pan-Cancer panels to assess best practices for oncopanel sequencing. Results All panels demonstrate high sensitivity across targeted high-confidence coding regions and variant types for the variants previously verified to have variant allele frequency (VAF) in the 5–20% range. Sensitivity is reduced by utilizing VAF thresholds due to inherent variability in VAF measurements. Enforcing a VAF threshold for reporting has a positive impact on reducing false positive calls. Importantly, the false positive rate is found to be significantly higher outside the high-confidence coding regions, resulting in lower reproducibility. Thus, region restriction and VAF thresholds lead to low relative technical variability in estimating promising biomarkers and tumor mutational burden. Conclusion This comprehensive study provides actionable guidelines for oncopanel sequencing and clear evidence that supports a simplified approach to assess the analytical performance of oncopanels. It will facilitate the rapid implementation, validation, and quality control of oncopanels in clinical use.
- Published
- 2021
- Full Text
- View/download PDF
26. Structure–activity relationship-based chemical classification of highly imbalanced Tox21 datasets
- Author
-
Gabriel Idakwo, Sundar Thangapandian, Joseph Luttrell, Yan Li, Nan Wang, Zhaoxian Zhou, Huixiao Hong, Bei Yang, Chaoyang Zhang, and Ping Gong
- Subjects
Structure–activity relationship (SAR) ,Chemical classification ,Molecular fingerprints ,Random forest (RF) ,Ensemble learning ,Bootstrap aggregation (bagging) ,Information technology ,T58.5-58.64 ,Chemistry ,QD1-999 - Abstract
Abstract The specificity of toxicant-target biomolecule interactions lends to the very imbalanced nature of many toxicity datasets, causing poor performance in Structure–Activity Relationship (SAR)-based chemical classification. Undersampling and oversampling are representative techniques for handling such an imbalance challenge. However, removing inactive chemical compound instances from the majority class using an undersampling technique can result in information loss, whereas increasing active toxicant instances in the minority class by interpolation tends to introduce artificial minority instances that often cross into the majority class space, giving rise to class overlapping and a higher false prediction rate. In this study, in order to improve the prediction accuracy of imbalanced learning, we employed SMOTEENN, a combination of Synthetic Minority Over-sampling Technique (SMOTE) and Edited Nearest Neighbor (ENN) algorithms, to oversample the minority class by creating synthetic samples, followed by cleaning the mislabeled instances. We chose the highly imbalanced Tox21 dataset, which consisted of 12 in vitro bioassays for > 10,000 chemicals that were distributed unevenly between binary classes. With Random Forest (RF) as the base classifier and bagging as the ensemble strategy, we applied four hybrid learning methods, i.e., RF without imbalance handling (RF), RF with Random Undersampling (RUS), RF with SMOTE (SMO), and RF with SMOTEENN (SMN). The performance of the four learning methods was compared using nine evaluation metrics, among which F1 score, Matthews correlation coefficient and Brier score provided a more consistent assessment of the overall performance across the 12 datasets. The Friedman’s aligned ranks test and the subsequent Bergmann-Hommel post hoc test showed that SMN significantly outperformed the other three methods. We also found that a strong negative correlation existed between the prediction accuracy and the imbalance ratio (IR), which is defined as the number of inactive compounds divided by the number of active compounds. SMN became less effective when IR exceeded a certain threshold (e.g., > 28). The ability to separate the few active compounds from the vast amounts of inactive ones is of great importance in computational toxicology. This work demonstrates that the performance of SAR-based, imbalanced chemical toxicity classification can be significantly improved through the use of data rebalancing.
- Published
- 2020
- Full Text
- View/download PDF
27. Editorial: Unleashing Innovation on Precision Public Health–Highlights From the MCBIOS and MAQC 2021 Joint Conference
- Author
-
Ramin Homayouni, Huixiao Hong, Prashanti Manda, Bindu Nanduri, and Inimary T. Toby
- Subjects
machine learning ,genomics ,adverse drug effects ,alternatives to animal testing ,artificial intelligence ,Electronic computers. Computer science ,QA75.5-76.95 - Published
- 2022
- Full Text
- View/download PDF
28. Development of a Nicotinic Acetylcholine Receptor nAChR α7 Binding Activity Prediction Model.
- Author
-
Sugunadevi Sakkiah, Carmine Leggett, Bohu Pan, Wenjing Guo, Luis G. Valerio, and Huixiao Hong
- Published
- 2020
- Full Text
- View/download PDF
29. dnAQET: a framework to compute a consolidated metric for benchmarking quality of de novo assemblies
- Author
-
Gokhan Yavas, Huixiao Hong, and Wenming Xiao
- Subjects
de novo genome assembly ,Assembly quality assessment ,Next generation sequencing ,Misassembly ,Biotechnology ,TP248.13-248.65 ,Genetics ,QH426-470 - Abstract
Abstract Background Accurate de novo genome assembly has become reality with the advancements in sequencing technology. With the ever-increasing number of de novo genome assembly tools, assessing the quality of assemblies has become of great importance in genome research. Although many quality metrics have been proposed and software tools for calculating those metrics have been developed, the existing tools do not produce a unified measure to reflect the overall quality of an assembly. Results To address this issue, we developed the de novo Assembly Quality Evaluation Tool (dnAQET) that generates a unified metric for benchmarking the quality assessment of assemblies. Our framework first calculates individual quality scores for the scaffolds/contigs of an assembly by aligning them to a reference genome. Next, it computes a quality score for the assembly using its overall reference genome coverage, the quality score distribution of its scaffolds and the redundancy identified in it. Using synthetic assemblies randomly generated from the latest human genome build, various builds of the reference genomes for five organisms and six de novo assemblies for sample NA24385, we tested dnAQET to assess its capability for benchmarking quality evaluation of genome assemblies. For synthetic data, our quality score increased with decreasing number of misassemblies and redundancy and increasing average contig length and coverage, as expected. For genome builds, dnAQET quality score calculated for a more recent reference genome was better than the score for an older version. To compare with some of the most frequently used measures, 13 other quality measures were calculated. The quality score from dnAQET was found to be better than all other measures in terms of consistency with the known quality of the reference genomes, indicating that dnAQET is reliable for benchmarking quality assessment of de novo genome assemblies. Conclusions The dnAQET is a scalable framework designed to evaluate a de novo genome assembly based on the aggregated quality of its scaffolds (or contigs). Our results demonstrated that dnAQET quality score is reliable for benchmarking quality assessment of genome assemblies. The dnQAET can help researchers to identify the most suitable assembly tools and to select high quality assemblies generated.
- Published
- 2019
- Full Text
- View/download PDF
30. Similarities and differences between variants called with human reference genome HG19 or HG38
- Author
-
Bohu Pan, Rebecca Kusko, Wenming Xiao, Yuanting Zheng, Zhichao Liu, Chunlin Xiao, Sugunadevi Sakkiah, Wenjing Guo, Ping Gong, Chaoyang Zhang, Weigong Ge, Leming Shi, Weida Tong, and Huixiao Hong
- Subjects
Next generation sequencing ,Human reference genomes ,SNV ,Calling pipeline comparison ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background Reference genome selection is a prerequisite for successful analysis of next generation sequencing (NGS) data. Current practice employs one of the two most recent human reference genome versions: HG19 or HG38. To date, the impact of genome version on SNV identification has not been rigorously assessed. Methods We conducted analysis comparing the SNVs identified based on HG19 vs HG38, leveraging whole genome sequencing (WGS) data from the genome-in-a-bottle (GIAB) project. First, SNVs were called using 26 different bioinformatics pipelines with either HG19 or HG38. Next, two tools were used to convert the called SNVs between HG19 and HG38. Lastly we calculated conversion rates, analyzed discordant rates between SNVs called with HG19 or HG38, and characterized the discordant SNVs. Results The conversion rates from HG38 to HG19 (average 95%) were lower than the conversion rates from HG19 to HG38 (average 99%). The conversion rates varied slightly among the various calling pipelines. Around 1.5% SNVs were discordantly converted between HG19 or HG38. The conversions from HG38 to HG19 had more SNVs which failed conversion and more discordant SNVs than the opposite conversion (HG19 to HG38). Most of the discordant SNVs had low read depth, were low confidence SNVs as defined by GIAB, and/or were predominated by G/C alleles (52% observed versus 42% expected). Conclusion A significant number of SNVs could not be converted between HG19 and HG38. Based on careful review of our comparisons, we recommend HG38 (the newer version) for NGS SNV analysis. To summarize, our findings suggest caution when translating identified SNVs between different versions of the human reference genome.
- Published
- 2019
- Full Text
- View/download PDF
31. Deep Learning Models for Predicting Gas Adsorption Capacity of Nanomaterials
- Author
-
Wenjing Guo, Jie Liu, Fan Dong, Ru Chen, Jayanti Das, Weigong Ge, Xiaoming Xu, and Huixiao Hong
- Subjects
metal–organic framework ,gas adsorption ,deep learning ,Chemistry ,QD1-999 - Abstract
Metal–organic frameworks (MOFs), a class of porous nanomaterials, have been widely used in gas adsorption-based applications due to their high porosities and chemical tunability. To facilitate the discovery of high-performance MOFs for different applications, a variety of machine learning models have been developed to predict the gas adsorption capacities of MOFs. Most of the predictive models are developed using traditional machine learning algorithms. However, the continuously increasing sizes of MOF datasets and the complicated relationships between MOFs and their gas adsorption capacities make deep learning a suitable candidate to handle such big data with increased computational power and accuracy. In this study, we developed models for predicting gas adsorption capacities of MOFs using two deep learning algorithms, multilayer perceptron (MLP) and long short-term memory (LSTM) networks, with a hypothetical set of about 130,000 structures of MOFs with methane and carbon dioxide adsorption data at different pressures. The models were evaluated using 10 iterations of 10-fold cross validations and 100 holdout validations. The MLP and LSTM models performed similarly with high prediction accuracy. The models for predicting gas adsorption at a higher pressure outperformed the models for predicting gas adsorption at a lower pressure. The deep learning models are more accurate than the random forest models reported in the literature, especially for predicting gas adsorption capacities at low pressures. Our results demonstrated that deep learning algorithms have a great potential to generate models that can accurately predict the gas adsorption capacities of MOFs.
- Published
- 2022
- Full Text
- View/download PDF
32. Author Correction: The SEQC2 epigenomics quality control (EpiQC) study
- Author
-
Jonathan Foox, Jessica Nordlund, Claudia Lalancette, Ting Gong, Michelle Lacey, Samantha Lent, Bradley W. Langhorst, V. K. Chaithanya Ponnaluri, Louise Williams, Karthik Ramaswamy Padmanabhan, Raymond Cavalcante, Anders Lundmark, Daniel Butler, Christopher Mozsary, Justin Gurvitch, John M. Greally, Masako Suzuki, Mark Menor, Masaki Nasu, Alicia Alonso, Caroline Sheridan, Andreas Scherer, Stephen Bruinsma, Gosia Golda, Agata Muszynska, Paweł P. Łabaj, Matthew A. Campbell, Frank Wos, Amanda Raine, Ulrika Liljedahl, Tomas Axelsson, Charles Wang, Zhong Chen, Zhaowei Yang, Jing Li, Xiaopeng Yang, Hongwei Wang, Ari Melnick, Shang Guo, Alexander Blume, Vedran Franke, Inmaculada Ibanez de Caceres, Carlos Rodriguez-Antolin, Rocio Rosas, Justin Wade Davis, Jennifer Ishii, Dalila B. Megherbi, Wenming Xiao, Will Liao, Joshua Xu, Huixiao Hong, Baitang Ning, Weida Tong, Altuna Akalin, Yunliang Wang, Youping Deng, and Christopher E. Mason
- Subjects
Biology (General) ,QH301-705.5 ,Genetics ,QH426-470 - Published
- 2021
- Full Text
- View/download PDF
33. Elucidating Interactions Between SARS-CoV-2 Trimeric Spike Protein and ACE2 Using Homology Modeling and Molecular Dynamics Simulations
- Author
-
Sugunadevi Sakkiah, Wenjing Guo, Bohu Pan, Zuowei Ji, Gokhan Yavas, Marli Azevedo, Jessica Hawes, Tucker A. Patterson, and Huixiao Hong
- Subjects
SARS-CoV-2 ,spike protein ,molecular dynamics simulations ,homology modeling ,COVID-19 ,Chemistry ,QD1-999 - Abstract
Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2) causes coronavirus disease 2019 (COVID-19). As of October 21, 2020, more than 41.4 million confirmed cases and 1.1 million deaths have been reported. Thus, it is immensely important to develop drugs and vaccines to combat COVID-19. The spike protein present on the outer surface of the virion plays a major role in viral infection by binding to receptor proteins present on the outer membrane of host cells, triggering membrane fusion and internalization, which enables release of viral ssRNA into the host cell. Understanding the interactions between the SARS-CoV-2 trimeric spike protein and its host cell receptor protein, angiotensin converting enzyme 2 (ACE2), is important for developing drugs and vaccines to prevent and treat COVID-19. Several crystal structures of partial and mutant SARS-CoV-2 spike proteins have been reported; however, an atomistic structure of the wild-type SARS-CoV-2 trimeric spike protein complexed with ACE2 is not yet available. Therefore, in our study, homology modeling was used to build the trimeric form of the spike protein complexed with human ACE2, followed by all-atom molecular dynamics simulations to elucidate interactions at the interface between the spike protein and ACE2. Molecular Mechanics Poisson-Boltzmann Surface Area (MMPBSA) and in silico alanine scanning were employed to characterize the interacting residues at the interface. Twenty interacting residues in the spike protein were identified that are likely to be responsible for tightly binding to ACE2, of which five residues (Val445, Thr478, Gly485, Phe490, and Ser494) were not reported in the crystal structure of the truncated spike protein receptor binding domain (RBD) complexed with ACE2. These data indicate that the interactions between ACE2 and the tertiary structure of the full-length spike protein trimer are different from those between ACE2 and the truncated monomer of the spike protein RBD. These findings could facilitate the development of drugs and vaccines to prevent SARS-CoV-2 infection and combat COVID-19.
- Published
- 2021
- Full Text
- View/download PDF
34. Comparative toxicogenomics of three insensitive munitions constituents 2, 4-dinitroanisole, nitroguanidine and nitrotriazolone in the soil nematode Caenorhabditis elegans.
- Author
-
Ping Gong 0001, Keri B. Donohue, Anne M. Mayo, Yuping Wang, Huixiao Hong, Mitchell S. Wilbanks, Natalie D. Barker, Xin Guan, and Kurt A. Gust
- Published
- 2018
- Full Text
- View/download PDF
35. In Silico Pharmacoepidemiologic Evaluation of Drug-Induced Cardiovascular Complications Using Combined Classifiers.
- Author
-
Chuipu Cai, Jiansong Fang, Pengfei Guo, Qi Wang, Huixiao Hong, Javid Moslehi, and Feixiong Cheng
- Published
- 2018
- Full Text
- View/download PDF
36. Factorial analysis of error correction performance using simulated next-generation sequencing data.
- Author
-
Isaac Akogwu, Nan Wang 0002, Chaoyang Zhang, Hwanseok Choi, Huixiao Hong, and Ping Gong 0001
- Published
- 2016
- Full Text
- View/download PDF
37. Deep learning architectures for multi-label classification of intelligent health risk prediction
- Author
-
Andrew Maxwell, Runzhi Li, Bei Yang, Heng Weng, Aihua Ou, Huixiao Hong, Zhaoxian Zhou, Ping Gong, and Chaoyang Zhang
- Subjects
Deep neural networks ,Deep learning ,Intelligent health risk prediction ,Multi-label classification ,Medical health records ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background Multi-label classification of data remains to be a challenging problem. Because of the complexity of the data, it is sometimes difficult to infer information about classes that are not mutually exclusive. For medical data, patients could have symptoms of multiple different diseases at the same time and it is important to develop tools that help to identify problems early. Intelligent health risk prediction models built with deep learning architectures offer a powerful tool for physicians to identify patterns in patient data that indicate risks associated with certain types of chronic diseases. Results Physical examination records of 110,300 anonymous patients were used to predict diabetes, hypertension, fatty liver, a combination of these three chronic diseases, and the absence of disease (8 classes in total). The dataset was split into training (90%) and testing (10%) sub-datasets. Ten-fold cross validation was used to evaluate prediction accuracy with metrics such as precision, recall, and F-score. Deep Learning (DL) architectures were compared with standard and state-of-the-art multi-label classification methods. Preliminary results suggest that Deep Neural Networks (DNN), a DL architecture, when applied to multi-label classification of chronic diseases, produced accuracy that was comparable to that of common methods such as Support Vector Machines. We have implemented DNNs to handle both problem transformation and algorithm adaption type multi-label methods and compare both to see which is preferable. Conclusions Deep Learning architectures have the potential of inferring more information about the patterns of physical examination data than common classification methods. The advanced techniques of Deep Learning can be used to identify the significance of different features from physical examination data as well as to learn the contributions of each feature that impact a patient’s risk for chronic diseases. However, accurate prediction of chronic disease risks remains a challenging problem that warrants further studies.
- Published
- 2017
- Full Text
- View/download PDF
38. Mechanistic roles of microRNAs in hepatocarcinogenesis: A study of thioacetamide with multiple doses and time-points of rats
- Author
-
Harsh Dweep, Yuji Morikawa, Binsheng Gong, Jian Yan, Zhichao Liu, Tao Chen, Halil Bisgin, Wen Zou, Huixiao Hong, Tieliu Shi, Ping Gong, Christina Castro, Takeki Uehara, Yuping Wang, and Weida Tong
- Subjects
Medicine ,Science - Abstract
Abstract Environmental chemicals exposure is one of the primary factors for liver toxicity and hepatocarcinoma. Thioacetamide (TAA) is a well-known hepatotoxicant and could be a liver carcinogen in humans. The discovery of early and sensitive microRNA (miRNA) biomarkers in liver injury and tumor progression could improve cancer diagnosis, prognosis, and management. To study this, we performed next generation sequencing of the livers of Sprague-Dawley rats treated with TAA at three doses (4.5, 15 and 45 mg/kg) and four time points (3-, 7-, 14- and 28-days). Overall, 330 unique differentially expressed miRNAs (DEMs) were identified in the entire TAA-treatment course. Of these, 129 DEMs were found significantly enriched for the “liver cancer” annotation. These results were further complemented by pathway analysis (Molecular Mechanisms of Cancer, p53-, TGF-β-, MAPK- and Wnt-signaling). Two miRNAs (rno-miR-34a-5p and rno-miR-455-3p) out of 48 overlapping DEMs were identified to be early and sensitive biomarkers for TAA-induced hepatocarcinogenicity. We have shown significant regulatory associations between DEMs and TAA-induced liver carcinogenesis at an earlier stage than histopathological features. Most importantly, miR-34a-5p is the most suitable early and sensitive biomarker for TAA-induced hepatocarcinogenesis due to its consistent elevation during the entire treatment course.
- Published
- 2017
- Full Text
- View/download PDF
39. Nanomaterial Databases: Data Sources for Promoting Design and Risk Assessment of Nanomaterials
- Author
-
Zuowei Ji, Wenjing Guo, Sugunadevi Sakkiah, Jie Liu, Tucker A. Patterson, and Huixiao Hong
- Subjects
nanomaterial ,database ,physicochemical property ,bioactivity ,characterization ,Chemistry ,QD1-999 - Abstract
Nanomaterials have drawn increasing attention due to their tunable and enhanced physicochemical and biological performance compared to their conventional bulk materials. Owing to the rapid expansion of the nano-industry, large amounts of data regarding the synthesis, physicochemical properties, and bioactivities of nanomaterials have been generated. These data are a great asset to the scientific community. However, the data are on diverse aspects of nanomaterials and in different sources and formats. To help utilize these data, various databases on specific information of nanomaterials such as physicochemical characterization, biomedicine, and nano-safety have been developed and made available online. Understanding the structure, function, and available data in these databases is needed for scientists to select appropriate databases and retrieve specific information for research on nanomaterials. However, to our knowledge, there is no study to systematically compare these databases to facilitate their utilization in the field of nanomaterials. Therefore, we reviewed and compared eight widely used databases of nanomaterials, aiming to provide the nanoscience community with valuable information about the specific content and function of these databases. We also discuss the pros and cons of these databases, thus enabling more efficient and convenient utilization.
- Published
- 2021
- Full Text
- View/download PDF
40. Individualized network-based drug repositioning infrastructure for precision oncology in the panomics era.
- Author
-
Feixiong Cheng, Huixiao Hong, Sheng-Yong Yang, and Yuquan Wei
- Published
- 2017
- Full Text
- View/download PDF
41. Identification of Epidemiological Traits by Analysis of SARS−CoV−2 Sequences
- Author
-
Bohu Pan, Zuowei Ji, Sugunadevi Sakkiah, Wenjing Guo, Jie Liu, Tucker A. Patterson, and Huixiao Hong
- Subjects
SARS−CoV−2 ,COVID-19 ,genome ,sequence ,epidemiological trait ,phylogenetic analysis ,Microbiology ,QR1-502 - Abstract
Severe acute respiratory syndrome coronavirus 2 (SARS−CoV−2) has caused the ongoing global COVID-19 pandemic that began in late December 2019. The rapid spread of SARS−CoV−2 is primarily due to person-to-person transmission. To understand the epidemiological traits of SARS−CoV−2 transmission, we conducted phylogenetic analysis on genome sequences from >54K SARS−CoV−2 cases obtained from two public databases. Hierarchical clustering analysis on geographic patterns in the resulting phylogenetic trees revealed a co-expansion tendency of the virus among neighboring countries with diverse sources and transmission routes for SARS−CoV−2. Pairwise sequence similarity analysis demonstrated that SARS−CoV−2 is transmitted locally and evolves during transmission. However, no significant differences were seen among SARS−CoV−2 genomes grouped by host age or sex. Here, our identified epidemiological traits provide information to better prevent transmission of SARS−CoV−2 and to facilitate the development of effective vaccines and therapeutics against the virus.
- Published
- 2021
- Full Text
- View/download PDF
42. Software-Assisted Pattern Recognition of Persistent Organic Pollutants in Contaminated Human and Animal Food
- Author
-
Wenjing Guo, Jeffrey Archer, Morgan Moore, Sina Shojaee, Wen Zou, Weigong Ge, Linda Benjamin, Anthony Adeuya, Russell Fairchild, and Huixiao Hong
- Subjects
persistent organic pollutant ,software ,similarity ,congener pattern ,contamination ,Organic chemistry ,QD241-441 - Abstract
Persistent Organic Pollutants (POPs) are a serious food safety concern due to their persistence and toxic effects. To promote food safety and protect human health, it is important to understand the sources of POPs and how to minimize human exposure to these contaminants. The POPs Program within the U.S. Food and Drug Administration (FDA), manually evaluates congener patterns of POPs-contaminated samples and sometimes compares the finding to other previously analyzed samples with similar patterns. This manual comparison is time consuming and solely depends on human expertise. To improve the efficiency of this evaluation, we developed software to assist in identifying potential sources of POPs contamination by detecting similarities between the congener patterns of a contaminated sample and potential environmental source samples. Similarity scores were computed and used to rank potential source samples. The software has been tested on a diverse set of incurred samples by comparing results from the software with those from human experts. We demonstrated that the software provides results consistent with human expert observation. This software also provided the advantage of reliably evaluating an increased sample lot which increased overall efficiency.
- Published
- 2021
- Full Text
- View/download PDF
43. In silico identification of genetic mutations conferring resistance to acetohydroxyacid synthase inhibitors: A case study of Kochia scoparia.
- Author
-
Yan Li, Michael D Netherland, Chaoyang Zhang, Huixiao Hong, and Ping Gong
- Subjects
Medicine ,Science - Abstract
Mutations that confer herbicide resistance are a primary concern for herbicide-based chemical control of invasive plants and are often under-characterized structurally and functionally. As the outcome of selection pressure, resistance mutations usually result from repeated long-term applications of herbicides with the same mode of action and are discovered through extensive field trials. Here we used acetohydroxyacid synthase (AHAS) of Kochia scoparia (KsAHAS) as an example to demonstrate that, given the sequence of a target protein, the impact of genetic mutations on ligand binding could be evaluated and resistance mutations could be identified using a biophysics-based computational approach. Briefly, the 3D structures of wild-type (WT) and mutated KsAHAS-herbicide complexes were constructed by homology modeling, docking and molecular dynamics simulation. The resistance profile of two AHAS-inhibiting herbicides, tribenuron methyl and thifensulfuron methyl, was obtained by estimating their binding affinity with 29 KsAHAS (1 WT and 28 mutated) using 6 molecular mechanical (MM) and 18 hybrid quantum mechanical/molecular mechanical (QM/MM) methods in combination with three structure sampling strategies. By comparing predicted resistance with experimentally determined resistance in the 29 biotypes of K. scoparia field populations, we identified the best method (i.e., MM-PBSA with single structure) out of all tested methods for the herbicide-KsAHAS system, which exhibited the highest accuracy (up to 100%) in discerning mutations conferring resistance or susceptibility to the two AHAS inhibitors. Our results suggest that the in silico approach has the potential to be widely adopted for assessing mutation-endowed herbicide resistance on a case-by-case basis.
- Published
- 2019
- Full Text
- View/download PDF
44. Structural Changes Due to Antagonist Binding in Ligand Binding Pocket of Androgen Receptor Elucidated Through Molecular Dynamics Simulations
- Author
-
Sugunadevi Sakkiah, Rebecca Kusko, Bohu Pan, Wenjing Guo, Weigong Ge, Weida Tong, and Huixiao Hong
- Subjects
androgen receptor ,molecular dynamics simulations ,induced molecular docking ,bicalutamide ,agonist ,antagonist ,Therapeutics. Pharmacology ,RM1-950 - Abstract
When a small molecule binds to the androgen receptor (AR), a conformational change can occur which impacts subsequent binding of co-regulator proteins and DNA. In order to accurately study this mechanism, the scientific community needs a crystal structure of the Wild type AR (WT-AR) ligand binding domain, bound with antagonist. To address this open need, we leveraged molecular docking and molecular dynamics (MD) simulations to construct a structure of the WT-AR ligand binding domain bound with antagonist bicalutamide. The structure of mutant AR (Mut-AR) bound with this same antagonist informed this study. After molecular docking analysis pinpointed the suitable binding orientation of a ligand in AR, the model was further optimized through 1 μs of MD simulations. Using this approach, three molecular systems were studied: (1) WT-AR bound with agonist R1881, (2) WT-AR bound with antagonist bicalutamide, and (3) Mut-AR bound with bicalutamide. Our structures were very similar to the experimentally determined structures of both WT-AR with R1881 and Mut-AR with bicalutamide, demonstrating the trustworthiness of this approach. In our model, when WT-AR is bound with bicalutamide, Val716/Lys720/Gln733, or Met734/Gln738/Glu897 move and thus disturb the positive and negative charge clumps of the AF2 site. This disruption of the AF2 site is key for understanding the impact of antagonist binding on subsequent co-regulator binding. In conclusion, the antagonist induced structural changes in WT-AR detailed in this study will enable further AR research and will facilitate AR targeting drug discovery.
- Published
- 2018
- Full Text
- View/download PDF
45. Integrative approaches for studying the role of noncoding RNAs in influencing drug efficacy and toxicity
- Author
-
Dongying Li, Minjun Chen, Huixiao Hong, Weida Tong, and Baitang Ning
- Subjects
Pharmacology ,RNA, Untranslated ,Databases, Factual ,Humans ,General Medicine ,Toxicology ,Algorithms ,Article - Abstract
INTRODUCTION: Drug efficacy and toxicity are important factors for evaluation in drug development. Drug metabolizing enzymes and transporters (DMETs) play an essential role in drug efficacy and toxicity. Noncoding RNAs (ncRNAs) have been implicated to influence inter-individual variations in drug efficacy and safety by regulating DMETs. An efficient strategy is urgently needed to identify and functionally characterize ncRNAs that mediate drug efficacy and toxicity through regulating DMETs. AREAS COVERED: We outline an integrative strategy to identify ncRNAs that modulate DMETs. We include reliable tools and databases for computational prediction of ncRNA targets with regard to their advantages and limitations. Various biochemical, molecular, and cellular assays are discussed for in vitro experimental verification of the regulatory function of ncRNAs. In vivo approaches for association of ncRNAs with drug treatment and toxicity are also reviewed. EXPERT OPINION: A streamlined integration of computational prediction and wet-lab validation is important to elucidate mechanisms of ncRNAs in the regulation of DMETs related to drug efficacy and safety. Bioinformatic analyses using open-access tools and databases serve as a powerful booster for ncRNA Research in toxicology. Further refinement of computational algorithms and experimental technologies is needed to improve accuracy and efficiency in ncRNA target identification and characterization.
- Published
- 2022
- Full Text
- View/download PDF
46. Machine Learning for Predicting Gas Adsorption Capacities of Metal Organic Framework
- Author
-
Wenjing Guo, Jie Liu, Fan Dong, Tucker A. Patterson, and Huixiao Hong
- Published
- 2023
- Full Text
- View/download PDF
47. Quantitative Target-specific Toxicity Prediction Modeling (QTTPM): Coupling Machine Learning with Dynamic Protein–Ligand Interaction Descriptors (DyPLIDs) to Predict Androgen Receptor-mediated Toxicity
- Author
-
Sundar Thangapandian, Gabriel Idakwo, Joseph Luttrell, Huixiao Hong, Chaoyang Zhang, and Ping Gong
- Published
- 2023
- Full Text
- View/download PDF
48. Computational Modeling for the Prediction of Hepatotoxicity Caused by Drugs and Chemicals
- Author
-
Minjun Chen, Jie Liu, Tsung-Jen Liao, Kristin Ashby, Yue Wu, Leihong Wu, Weida Tong, and Huixiao Hong
- Published
- 2023
- Full Text
- View/download PDF
49. Machine Learning for Predicting Organ Toxicity
- Author
-
Jie Liu, Wenjing Guo, Fan Dong, Tucker A. Patterson, and Huixiao Hong
- Published
- 2023
- Full Text
- View/download PDF
50. Mold2 Descriptors Facilitate Development of Machine Learning and Deep Learning Models for Predicting Toxicity of Chemicals
- Author
-
Huixiao Hong, Jie Liu, Weigong Ge, Sugunadevi Sakkiah, Wenjing Guo, Gokhan Yavas, Chaoyang Zhang, Ping Gong, Weida Tong, and Tucker A. Patterson
- Published
- 2023
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.