341 results on '"Phenotype prediction"'
Search Results
2. Human limits in machine learning: prediction of potato yield and disease using soil microbiome data.
- Author
-
Aghdam, Rosa, Tang, Xudong, Shan, Shan, Lankau, Richard, and Solís-Lemus, Claudia
- Subjects
- *
MACHINE learning , *SOIL management , *AGRICULTURE , *PLANT performance , *BAYESIAN analysis - Abstract
Background: The preservation of soil health is a critical challenge in the 21st century due to its significant impact on agriculture, human health, and biodiversity. We provide one of the first comprehensive investigations into the predictive potential of machine learning models for understanding the connections between soil and biological phenotypes. We investigate an integrative framework performing accurate machine learning-based prediction of plant performance from biological, chemical, and physical properties of the soil via two models: random forest and Bayesian neural network. Results: Prediction improves when we add environmental features, such as soil properties and microbial density, along with microbiome data. Different preprocessing strategies show that human decisions significantly impact predictive performance. We show that the naive total sum scaling normalization that is commonly used in microbiome research is one of the optimal strategies to maximize predictive power. Also, we find that accurately defined labels are more important than normalization, taxonomic level, or model characteristics. ML performance is limited when humans can't classify samples accurately. Lastly, we provide domain scientists via a full model selection decision tree to identify the human choices that optimize model prediction power. Conclusions: Our study highlights the importance of incorporating diverse environmental features and careful data preprocessing in enhancing the predictive power of machine learning models for soil and biological phenotype connections. This approach can significantly contribute to advancing agricultural practices and soil health management. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Human limits in machine learning: prediction of potato yield and disease using soil microbiome data
- Author
-
Rosa Aghdam, Xudong Tang, Shan Shan, Richard Lankau, and Claudia Solís-Lemus
- Subjects
Soil microbiome ,Phenotype prediction ,Microbiome networks analysis ,Machine learning ,Bayesian neural networks ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background The preservation of soil health is a critical challenge in the 21st century due to its significant impact on agriculture, human health, and biodiversity. We provide one of the first comprehensive investigations into the predictive potential of machine learning models for understanding the connections between soil and biological phenotypes. We investigate an integrative framework performing accurate machine learning-based prediction of plant performance from biological, chemical, and physical properties of the soil via two models: random forest and Bayesian neural network. Results Prediction improves when we add environmental features, such as soil properties and microbial density, along with microbiome data. Different preprocessing strategies show that human decisions significantly impact predictive performance. We show that the naive total sum scaling normalization that is commonly used in microbiome research is one of the optimal strategies to maximize predictive power. Also, we find that accurately defined labels are more important than normalization, taxonomic level, or model characteristics. ML performance is limited when humans can’t classify samples accurately. Lastly, we provide domain scientists via a full model selection decision tree to identify the human choices that optimize model prediction power. Conclusions Our study highlights the importance of incorporating diverse environmental features and careful data preprocessing in enhancing the predictive power of machine learning models for soil and biological phenotype connections. This approach can significantly contribute to advancing agricultural practices and soil health management.
- Published
- 2024
- Full Text
- View/download PDF
4. Detecting outliers in case-control cohorts for improving deep learning networks on Schizophrenia prediction
- Author
-
Martins Daniel, Abbasi Maryam, Egas Conceição, and Arrais Joel P.
- Subjects
machine learning ,deep learning ,phenotype prediction ,schizophrenia ,Biotechnology ,TP248.13-248.65 - Abstract
This study delves into the intricate genetic and clinical aspects of Schizophrenia, a complex mental disorder with uncertain etiology. Deep Learning (DL) holds promise for analyzing large genomic datasets to uncover new risk factors. However, based on reports of non-negligible misdiagnosis rates for SCZ, case-control cohorts may contain outlying genetic profiles, hindering compelling performances of classification models. The research employed a case-control dataset sourced from the Swedish populace. A gene-annotation-based DL architecture was developed and employed in two stages. First, the model was trained on the entire dataset to highlight differences between cases and controls. Then, samples likely to be misclassified were excluded, and the model was retrained on the refined dataset for performance evaluation. The results indicate that SCZ prevalence and misdiagnosis rates can affect case-control cohorts, potentially compromising future studies reliant on such datasets. However, by detecting and filtering outliers, the study demonstrates the feasibility of adapting DL methodologies to large-scale biological problems, producing results more aligned with existing heritability estimates for SCZ. This approach not only advances the comprehension of the genetic background of SCZ but also opens doors for adapting DL techniques in complex research for precision medicine in mental health.
- Published
- 2024
- Full Text
- View/download PDF
5. BioM2: biologically informed multi-stage machine learning for phenotype prediction using omics data.
- Author
-
Zhang, Shunjie, Li, Pan, Wang, Shenghan, Zhu, Jijun, Huang, Zhongting, Cai, Fuqiang, Freidel, Sebastian, Ling, Fei, Schwarz, Emanuel, and Chen, Junfang
- Subjects
- *
MACHINE learning , *INDEPENDENT variables , *DNA methylation , *GENE ontology , *GENE expression - Abstract
Navigating the complex landscape of high-dimensional omics data with machine learning models presents a significant challenge. The integration of biological domain knowledge into these models has shown promise in creating more meaningful stratifications of predictor variables, leading to algorithms that are both more accurate and generalizable. However, the wider availability of machine learning tools capable of incorporating such biological knowledge remains limited. Addressing this gap, we introduce BioM2 , a novel R package designed for biologically informed multistage machine learning. BioM2 uniquely leverages biological information to effectively stratify and aggregate high-dimensional biological data in the context of machine learning. Demonstrating its utility with genome-wide DNA methylation and transcriptome-wide gene expression data, BioM2 has shown to enhance predictive performance, surpassing traditional machine learning models that operate without the integration of biological knowledge. A key feature of BioM2 is its ability to rank predictor variables within biological categories, specifically Gene Ontology pathways. This functionality not only aids in the interpretability of the results but also enables a subsequent modular network analysis of these variables, shedding light on the intricate systems-level biology underpinning the predictive outcome. We have proposed a biologically informed multistage machine learning framework termed BioM2 for phenotype prediction based on omics data. BioM2 has been incorporated into the BioM2 CRAN package (https://cran.r-project.org/web/packages/BioM2/index.html). [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. Evaluation of polygenic scoring methods in five biobanks shows larger variation between biobanks than methods and finds benefits of ensemble learning.
- Author
-
Monti, Remo, Eick, Lisa, Hudjashov, Georgi, Läll, Kristi, Kanoni, Stavroula, Wolford, Brooke N., Wingfield, Benjamin, Pain, Oliver, Wharrie, Sophie, Jermy, Bradley, McMahon, Aoife, Hartonen, Tuomo, Heyne, Henrike, Mars, Nina, Lambert, Samuel, Hveem, Kristian, Inouye, Michael, van Heel, David A., Mägi, Reedik, and Marttinen, Pekka
- Subjects
- *
BIOBANKS , *GENOME-wide association studies , *TYPE 1 diabetes , *RHEUMATOID arthritis , *AUTOIMMUNE diseases - Abstract
Methods of estimating polygenic scores (PGSs) from genome-wide association studies are increasingly utilized. However, independent method evaluation is lacking, and method comparisons are often limited. Here, we evaluate polygenic scores derived via seven methods in five biobank studies (totaling about 1.2 million participants) across 16 diseases and quantitative traits, building on a reference-standardized framework. We conducted meta-analyses to quantify the effects of method choice, hyperparameter tuning, method ensembling, and the target biobank on PGS performance. We found that no single method consistently outperformed all others. PGS effect sizes were more variable between biobanks than between methods within biobanks when methods were well tuned. Differences between methods were largest for the two investigated autoimmune diseases, seropositive rheumatoid arthritis and type 1 diabetes. For most methods, cross-validation was more reliable for tuning hyperparameters than automatic tuning (without the use of target data). For a given target phenotype, elastic net models combining PGS across methods (ensemble PGS) tuned in the UK Biobank provided consistent, high, and cross-biobank transferable performance, increasing PGS effect sizes (β coefficients) by a median of 5.0% relative to LDpred2 and MegaPRS (the two best-performing single methods when tuned with cross-validation). Our interactively browsable online-results and open-source workflow prspipe provide a rich resource and reference for the analysis of polygenic scoring methods across biobanks. [Display omitted] Systematic evaluation of polygenic scoring methods in 1.2 million individuals across five biobanks finds that no single method performs best. Performance varied more between biobanks than between methods, suggesting that future research should address between-biobank variability. Ensembles provided high, robust, and transferable performance. Workflow and results browser are open source. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. Investigation of gut microbiota disorders in norovirus infected children patients based on 16s rRNA sequencing
- Author
-
Jie Li, Nan Jiang, Hui Zheng, Xiao Zheng, Yi Xu, Yongqing Weng, Feijian Jiang, Chong Wang, and Peiliang Chang
- Subjects
Norovirus infection ,gut microbiota ,functional prediction ,phenotype prediction ,Medicine - Abstract
Background Norovirus is the leading cause of sporadic viral gastroenteritis cases and outbreaks. Gut microbiota plays a key role in maintaining immune homeostasis. We aimed to investigate the composition and functional effects of gut microbiota in children infected with norovirus.Methods Stool samples were collected from 31 children infected with norovirus and 25 healthy children. The gut microbiota was analyzed by 16S rRNA gene sequencing, followed by composition, correlation network, functional and phenotype prediction analyses.Results Gut microbiota in children infected with norovirus was characterized by lower species richness and diversity. Veillonella is the dominant gut microbiota specie in norovirus infection. Blautia was significantly lower in norovirus infection. There was a positive correlation between Faecalibacterium, Blautia, Subdoligranulum, Eubacterium_hallii_group, Fusicatenibacter, Agathobacter, Roseburia and Dorea. Functionally, secondary metabolites biosynthesis, transport and catabolism, selenocysteine lyase and peroxiredoxin were the most significantly higher functional compositions of gut microbiota in norovirus infection. However, sn-glycerol-1-phosphate dehydrogenase and fermentation were the most significantly lower functional compositions in norovirus infection group. Phenotype analysis showed that Contains_Mobile_Elements had the highest level of phenotypes in the gut microbiota of norovirus infection.Conclusion Norovirus infection may lead to dysregulation of the gut microbiome in children.
- Published
- 2024
- Full Text
- View/download PDF
8. Genomic Selection for Phenotype Prediction in Rice
- Author
-
Muthazhagu Kuppuraj, Sakthi Anand, Ramadoss, Bharathi Raja, Adhimoolam, Karthikeyan, Vedachalam, Vengadessan, Murugesan, Tamilzharasi, Tamilselvan, Anandhan, Singh, Akansha, editor, Singh, Shravan Kumar, editor, and Shrestha, Jiban, editor
- Published
- 2024
- Full Text
- View/download PDF
9. Evaluation of normalization methods for predicting quantitative phenotypes in metagenomic data analysis.
- Author
-
Beibei Wang and Yihui Luan
- Subjects
METAGENOMICS ,PHENOTYPES ,STANDARD deviations ,EVALUATION methodology ,DATA analysis ,FORECASTING - Abstract
Genotype-to-phenotype mapping is an essential problem in the current genomic era. While qualitative case-control predictions have received significant attention, less emphasis has been placed on predicting quantitative phenotypes. This emerging field holds great promise in revealing intricate connections between microbial communities and host health. However, the presence of heterogeneity in microbiome datasets poses a substantial challenge to the accuracy of predictions and undermines the reproducibility of models. To tackle this challenge, we investigated 22 normalization methods that aimed at removing heterogeneity across multiple datasets, conducted a comprehensive review of them, and evaluated their effectiveness in predicting quantitative phenotypes in three simulation scenarios and 31 real datasets. The results indicate that none of these methods demonstrate significant superiority in predicting quantitative phenotypes or attain a noteworthy reduction in Root Mean Squared Error (RMSE) of the predictions. Given the frequent occurrence of batch effects and the satisfactory performance of batch correction methods in predicting datasets affected by these effects, we strongly recommend utilizing batch correction methods as the initial step in predicting quantitative phenotypes. In summary, the performance of normalization methods in predicting metagenomic data remains a dynamic and ongoing research area. Our study contributes to this field by undertaking a comprehensive evaluation of diverse methods and offering valuable insights into their effectiveness in predicting quantitative phenotypes. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. Assessing computational predictions of antimicrobial resistance phenotypes from microbial genomes.
- Author
-
Hu, Kaixin, Meyer, Fernando, Deng, Zhi-Luo, Asgari, Ehsaneddin, Kuo, Tzu-Hao, Münch, Philipp C, and McHardy, Alice C
- Subjects
- *
MICROBIAL genomes , *DRUG resistance in microorganisms , *PHENOTYPES , *ENTEROCOCCUS faecium , *ACINETOBACTER baumannii , *NUCLEOTIDE sequencing , *STREPTOCOCCUS pneumoniae , *MYCOBACTERIUM tuberculosis - Abstract
The advent of rapid whole-genome sequencing has created new opportunities for computational prediction of antimicrobial resistance (AMR) phenotypes from genomic data. Both rule-based and machine learning (ML) approaches have been explored for this task, but systematic benchmarking is still needed. Here, we evaluated four state-of-the-art ML methods (Kover, PhenotypeSeeker, Seq2Geno2Pheno and Aytan-Aktug), an ML baseline and the rule-based ResFinder by training and testing each of them across 78 species–antibiotic datasets, using a rigorous benchmarking workflow that integrates three evaluation approaches, each paired with three distinct sample splitting methods. Our analysis revealed considerable variation in the performance across techniques and datasets. Whereas ML methods generally excelled for closely related strains, ResFinder excelled for handling divergent genomes. Overall, Kover most frequently ranked top among the ML approaches, followed by PhenotypeSeeker and Seq2Geno2Pheno. AMR phenotypes for antibiotic classes such as macrolides and sulfonamides were predicted with the highest accuracies. The quality of predictions varied substantially across species–antibiotic combinations, particularly for beta-lactams; across species, resistance phenotyping of the beta-lactams compound, aztreonam, amoxicillin/clavulanic acid, cefoxitin, ceftazidime and piperacillin/tazobactam, alongside tetracyclines demonstrated more variable performance than the other benchmarked antibiotics. By organism, Campylobacter jejuni and Enterococcus faecium phenotypes were more robustly predicted than those of Escherichia coli , Staphylococcus aureus , Salmonella enterica , Neisseria gonorrhoeae , Klebsiella pneumoniae , Pseudomonas aeruginosa , Acinetobacter baumannii , Streptococcus pneumoniae and Mycobacterium tuberculosis. In addition, our study provides software recommendations for each species–antibiotic combination. It furthermore highlights the need for optimization for robust clinical applications, particularly for strains that diverge substantially from those used for training. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. A comparative study of 11 non-linear regression models highlighting autoencoder, DBN, and SVR, enhanced by SHAP importance analysis in soybean branching prediction
- Author
-
Wei Zhou, Zhengxiao Yan, and Liting Zhang
- Subjects
Phenotype prediction ,Non-linear regression ,Artificial intelligence ,Feature importance ,Medicine ,Science - Abstract
Abstract To explore a robust tool for advancing digital breeding practices through an artificial intelligence-driven phenotype prediction expert system, we undertook a thorough analysis of 11 non-linear regression models. Our investigation specifically emphasized the significance of Support Vector Regression (SVR) and SHapley Additive exPlanations (SHAP) in predicting soybean branching. By using branching data (phenotype) of 1918 soybean accessions and 42 k SNP (Single Nucleotide Polymorphism) polymorphic data (genotype), this study systematically compared 11 non-linear regression AI models, including four deep learning models (DBN (deep belief network) regression, ANN (artificial neural network) regression, Autoencoders regression, and MLP (multilayer perceptron) regression) and seven machine learning models (e.g., SVR (support vector regression), XGBoost (eXtreme Gradient Boosting) regression, Random Forest regression, LightGBM regression, GPs (Gaussian processes) regression, Decision Tree regression, and Polynomial regression). After being evaluated by four valuation metrics: R2 (R-squared), MAE (Mean Absolute Error), MSE (Mean Squared Error), and MAPE (Mean Absolute Percentage Error), it was found that the SVR, Polynomial Regression, DBN, and Autoencoder outperformed other models and could obtain a better prediction accuracy when they were used for phenotype prediction. In the assessment of deep learning approaches, we exemplified the SVR model, conducting analyses on feature importance and gene ontology (GO) enrichment to provide comprehensive support. After comprehensively comparing four feature importance algorithms, no notable distinction was observed in the feature importance ranking scores across the four algorithms, namely Variable Ranking, Permutation, SHAP, and Correlation Matrix, but the SHAP value could provide rich information on genes with negative contributions, and SHAP importance was chosen for feature selection. The results of this study offer valuable insights into AI-mediated plant breeding, addressing challenges faced by traditional breeding programs. The method developed has broad applicability in phenotype prediction, minor QTL (quantitative trait loci) mining, and plant smart-breeding systems, contributing significantly to the advancement of AI-based breeding practices and transitioning from experience-based to data-based breeding.
- Published
- 2024
- Full Text
- View/download PDF
12. Predicting Archaic Hominin Phenotypes from Genomic Data
- Author
-
Brand, Colin M, Colbran, Laura L, and Capra, John A
- Subjects
Human Genome ,Genetics ,Generic health relevance ,Animals ,DNA ,Ancient ,Genome ,Human ,Genomics ,Hominidae ,Humans ,Neanderthals ,Phenotype ,ancient DNA ,archaic hominin ,Denisovan ,Neanderthal ,phenotype prediction ,Evolutionary Biology ,Law ,Genetics & Heredity - Abstract
Ancient DNA provides a powerful window into the biology of extant and extinct species, including humans' closest relatives: Denisovans and Neanderthals. Here, we review what is known about archaic hominin phenotypes from genomic data and how those inferences have been made. We contend that understanding the influence of variants on lower-level molecular phenotypes-such as gene expression and protein function-is a promising approach to using ancient DNA to learn about archaic hominin traits. Molecular phenotypes have simpler genetic architectures than organism-level complex phenotypes, and this approach enables moving beyond association studies by proposing hypotheses about the effects of archaic variants that are testable in model systems. The major challenge to understanding archaic hominin phenotypes is broadening our ability to accurately map genotypes to phenotypes, but ongoing advances ensure that there will be much more to learn about archaic hominin phenotypes from their genomes.
- Published
- 2022
13. A comparative study of 11 non-linear regression models highlighting autoencoder, DBN, and SVR, enhanced by SHAP importance analysis in soybean branching prediction
- Author
-
Zhou, Wei, Yan, Zhengxiao, and Zhang, Liting
- Published
- 2024
- Full Text
- View/download PDF
14. DeepMPTB: a vaginal microbiome-based deep neural network as artificial intelligence strategy for efficient preterm birth prediction
- Author
-
Chakoory, Oshma, Barra, Vincent, Rochette, Emmanuelle, Blanchon, Loïc, Sapin, Vincent, Merlin, Etienne, Pons, Maguelonne, Gallot, Denis, Comtet-Marre, Sophie, and Peyret, Pierre
- Published
- 2024
- Full Text
- View/download PDF
15. Should we really use graph neural networks for transcriptomic prediction?
- Author
-
Brouard, Céline, Mourad, Raphaël, and Vialaneix, Nathalie
- Subjects
- *
GRAPH neural networks , *DEEP learning , *MACHINE learning , *GENE regulatory networks , *TRANSCRIPTOMES , *GENETIC regulation - Abstract
The recent development of deep learning methods have undoubtedly led to great improvement in various machine learning tasks, especially in prediction tasks. This type of methods have also been adapted to answer various problems in bioinformatics, including automatic genome annotation, artificial genome generation or phenotype prediction. In particular, a specific type of deep learning method, called graph neural network (GNN) has repeatedly been reported as a good candidate to predict phenotypes from gene expression because its ability to embed information on gene regulation or co-expression through the use of a gene network. However, up to date, no complete and reproducible benchmark has ever been performed to analyze the trade-off between cost and benefit of this approach compared to more standard (and simpler) machine learning methods. In this article, we provide such a benchmark, based on clear and comparable policies to evaluate the different methods on several datasets. Our conclusion is that GNN rarely provides a real improvement in prediction performance, especially when compared to the computation effort required by the methods. Our findings on a limited but controlled simulated dataset shows that this could be explained by the limited quality or predictive power of the input biological gene network itself. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. Eye and hair color prediction of human DNA recovered from Lucilia sericata larvae.
- Author
-
Deymenci, Emre, Sarı O, Ilksen, Filoglu, Gonul, Polat, Erdal, and Bulbul, Ozlem
- Subjects
- *
HUMAN hair color , *EYE color , *HUMAN DNA , *MAGGOT therapy , *DNA analysis , *DEBRIDEMENT , *HAIR analysis - Abstract
Forensic entomological evidence is employed to estimate minimum postmortem interval (PMImin), location, and identification of fly samples or human remains. Traditional forensic DNA analysis (i.e., STR, mitochondrial DNA) has been used for human identification from the larval gut contents. Forensic DNA phenotyping (FDP), predicting human appearance from DNA-based crime scene evidence, has become an established approach in forensic genetics in the past years. In this study, we aimed to recover human DNA from Lucilia sericata (Meigen 1826) (Diptera: Calliphoridae) gut contents and predict the eye and hair color of individuals using the HIrisPlex system. Lucilia sericata larvae and reference blood samples were collected from 30 human volunteers who were under maggot debridement therapy. The human DNA was extracted from the crop contents and quantified. HIrisPlex multiplex analysis was performed using the SNaPshot minisequencing procedure. The HIrisPlex online tool was used to assess the prediction of the eye and hair color of the larval and reference samples. We successfully genotyped 25 out of 30 larval samples, and the most SNP genotypes (87.13%) matched those of reference samples, though some alleles were dropped out, producing partial profiles. The prediction of the eye colors was accurate in 17 out of 25 larval samples, and only one sample was misclassified. Fourteen out of 25 larval samples were correctly predicted for hair color, and eight were misclassified. This study shows that SNP analysis of L. sericata gut contents can be used to predict eye and hair color of a corpse. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
17. Efficacy of federated learning on genomic data: a study on the UK Biobank and the 1000 Genomes Project
- Author
-
Dmitry Kolobkov, Satyarth Mishra Sharma, Aleksandr Medvedev, Mikhail Lebedev, Egor Kosaretskiy, and Ruslan Vakhitov
- Subjects
federated learning (FL) ,phenotype prediction ,ancestry prediction ,machine learning ,data collaboration ,genomics ,Information technology ,T58.5-58.64 - Abstract
Combining training data from multiple sources increases sample size and reduces confounding, leading to more accurate and less biased machine learning models. In healthcare, however, direct pooling of data is often not allowed by data custodians who are accountable for minimizing the exposure of sensitive information. Federated learning offers a promising solution to this problem by training a model in a decentralized manner thus reducing the risks of data leakage. Although there is increasing utilization of federated learning on clinical data, its efficacy on individual-level genomic data has not been studied. This study lays the groundwork for the adoption of federated learning for genomic data by investigating its applicability in two scenarios: phenotype prediction on the UK Biobank data and ancestry prediction on the 1000 Genomes Project data. We show that federated models trained on data split into independent nodes achieve performance close to centralized models, even in the presence of significant inter-node heterogeneity. Additionally, we investigate how federated model accuracy is affected by communication frequency and suggest approaches to reduce computational complexity or communication costs.
- Published
- 2024
- Full Text
- View/download PDF
18. 16S rDNA Sequencing-Based Insights into the Bacterial Community Structure and Function in Co-Existing Soil and Coal Gangue.
- Author
-
Ruan, Mengying, Hu, Zhenqi, Zhu, Qi, Li, Yuanyuan, and Nie, Xinran
- Subjects
BACTERIAL communities ,COAL ,BIOFILMS ,IN situ remediation ,RECOMBINANT DNA ,SOIL microbiology - Abstract
Coal gangue is a solid waste emitted during coal production. Coal gangue is deployed adjacent to mining land and has characteristics similar to those of the soils of these areas. Coal gangue–soil ecosystems provide habitats for a rich and active bacterial community. However, co-existence networks and the functionality of soil and coal gangue bacterial communities have not been studied. Here, we performed Illumina MiSeq high-throughput sequencing, symbiotic network and statistical analyses, and microbial phenotype prediction to study the microbial community in coal gangue and soil samples from Shanxi Province, China. In general, the structural difference between the bacterial communities in coal gangue and soil was large, indicating that interactions between soil and coal gangue are limited but not absent. The bacterial community exhibited a significant symbiosis network in soil and coal gangue. The co-occurrence network was primarily formed by Proteobacteria, Firmicutes, and Actinobacteria. In addition, BugBase microbiome phenotype predictions and PICRUSt bacterial functional potential predictions showed that transcription regulators represented the highest functional category of symbiotic bacteria in soil and coal gangue. Proteobacteria played an important role in various processes such as mobile element pathogenicity, oxidative stress tolerance, and biofilm formation. In general, this work provides a theoretical basis and data support for the in situ remediation of acidified coal gangue hills based on microbiological methods. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
19. FSF-GA: A Feature Selection Framework for Phenotype Prediction Using Genetic Algorithms.
- Author
-
Mowlaei, Mohammad Erfan and Shi, Xinghua
- Subjects
- *
FEATURE selection , *PHENOTYPES , *PHENOTYPIC plasticity , *FORECASTING , *GENOTYPES , *GENETIC algorithms - Abstract
(1) Background: Phenotype prediction is a pivotal task in genetics in order to identify how genetic factors contribute to phenotypic differences. This field has seen extensive research, with numerous methods proposed for predicting phenotypes. Nevertheless, the intricate relationship between genotypes and complex phenotypes, including common diseases, has resulted in an ongoing challenge to accurately decipher the genetic contribution. (2) Results: In this study, we propose a novel feature selection framework for phenotype prediction utilizing a genetic algorithm (FSF-GA) that effectively reduces the feature space to identify genotypes contributing to phenotype prediction. We provide a comprehensive vignette of our method and conduct extensive experiments using a widely used yeast dataset. (3) Conclusions: Our experimental results show that our proposed FSF-GA method delivers comparable phenotype prediction performance as compared to baseline methods, while providing features selected for predicting phenotypes. These selected feature sets can be used to interpret the underlying genetic architecture that contributes to phenotypic variation. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
20. Assessing computational predictions of the phenotypic effect of cystathionine‐beta‐synthase variants
- Author
-
Kasak, Laura, Bakolitsa, Constantina, Hu, Zhiqiang, Yu, Changhua, Rine, Jasper, Dimster‐Denk, Dago F, Pandey, Gaurav, Baets, Greet, Bromberg, Yana, Cao, Chen, Capriotti, Emidio, Casadio, Rita, Durme, Joost, Giollo, Manuel, Karchin, Rachel, Katsonis, Panagiotis, Leonardi, Emanuela, Lichtarge, Olivier, Martelli, Pier Luigi, Masica, David, Mooney, Sean D, Olatubosun, Ayodeji, Radivojac, Predrag, Rousseau, Frederic, Pal, Lipika R, Savojardo, Castrense, Schymkowitz, Joost, Thusberg, Janita, Tosatto, Silvio CE, Vihinen, Mauno, Väliaho, Jouni, Repo, Susanna, Moult, John, Brenner, Steven E, and Friedberg, Iddo
- Subjects
Biological Sciences ,Bioinformatics and Computational Biology ,Genetics ,Networking and Information Technology R&D (NITRD) ,Aetiology ,2.1 Biological and endogenous factors ,Generic health relevance ,Good Health and Well Being ,Amino Acid Substitution ,Computational Biology ,Cystathionine ,Cystathionine beta-Synthase ,Homocysteine ,Humans ,Phenotype ,Precision Medicine ,CAGI challenge ,critical assessment ,cystathionine-beta-synthase ,machine learning ,phenotype prediction ,single amino acid substitution ,Clinical Sciences ,Genetics & Heredity ,Clinical sciences - Abstract
Accurate prediction of the impact of genomic variation on phenotype is a major goal of computational biology and an important contributor to personalized medicine. Computational predictions can lead to a better understanding of the mechanisms underlying genetic diseases, including cancer, but their adoption requires thorough and unbiased assessment. Cystathionine-beta-synthase (CBS) is an enzyme that catalyzes the first step of the transsulfuration pathway, from homocysteine to cystathionine, and in which variations are associated with human hyperhomocysteinemia and homocystinuria. We have created a computational challenge under the CAGI framework to evaluate how well different methods can predict the phenotypic effect(s) of CBS single amino acid substitutions using a blinded experimental data set. CAGI participants were asked to predict yeast growth based on the identity of the mutations. The performance of the methods was evaluated using several metrics. The CBS challenge highlighted the difficulty of predicting the phenotype of an ex vivo system in a model organism when classification models were trained on human disease data. We also discuss the variations in difficulty of prediction for known benign and deleterious variants, as well as identify methodological and experimental constraints with lessons to be learned for future challenges.
- Published
- 2019
21. MetaPheno: A critical evaluation of deep learning and machine learning in metagenome-based disease prediction
- Author
-
LaPierre, Nathan, Ju, Chelsea J-T, Zhou, Guangyu, and Wang, Wei
- Subjects
Biological Sciences ,Bioinformatics and Computational Biology ,Human Genome ,Obesity ,Data Science ,Networking and Information Technology R&D (NITRD) ,Microbiome ,Machine Learning and Artificial Intelligence ,Genetics ,Good Health and Well Being ,Algorithms ,Deep Learning ,Diabetes Mellitus ,Type 2 ,Humans ,Machine Learning ,Metagenome ,Metagenomics ,Microbiota ,Deep learning ,Machine learning ,Phenotype prediction ,Clinical Sciences ,Biochemistry and cell biology - Abstract
The human microbiome plays a number of critical roles, impacting almost every aspect of human health and well-being. Conditions in the microbiome have been linked to a number of significant diseases. Additionally, revolutions in sequencing technology have led to a rapid increase in publicly-available sequencing data. Consequently, there have been growing efforts to predict disease status from metagenomic sequencing data, with a proliferation of new approaches in the last few years. Some of these efforts have explored utilizing a powerful form of machine learning called deep learning, which has been applied successfully in several biological domains. Here, we review some of these methods and the algorithms that they are based on, with a particular focus on deep learning methods. We also perform a deeper analysis of Type 2 Diabetes and obesity datasets that have eluded improved results, using a variety of machine learning and feature extraction methods. We conclude by offering perspectives on study design considerations that may impact results and future directions the field can take to improve results and offer more valuable conclusions. The scripts and extracted features for the analyses conducted in this paper are available via GitHub:https://github.com/nlapier2/metapheno.
- Published
- 2019
22. Assessment of deep learning and transfer learning for cancer prediction based on gene expression data
- Author
-
Blaise Hanczar, Victoria Bourgeais, and Farida Zehraoui
- Subjects
Deep neural network ,Transfer learning ,Phenotype prediction ,Gene expression ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background Machine learning is now a standard tool for cancer prediction based on gene expression data. However, deep learning is still new for this task, and there is no clear consensus about its performance and utility. Few experimental works have evaluated deep neural networks and compared them with state-of-the-art machine learning. Moreover, their conclusions are not consistent. Results We extensively evaluate the deep learning approach on 22 cancer prediction tasks based on gene expression data. We measure the impact of the main hyper-parameters and compare the performances of neural networks with the state-of-the-art. We also investigate the effectiveness of several transfer learning schemes in different experimental setups. Conclusion Based on our experimentations, we provide several recommendations to optimize the construction and training of a neural network model. We show that neural networks outperform the state-of-the-art methods only for very large training set size. For a small training set, we show that transfer learning is possible and may strongly improve the model performance in some cases.
- Published
- 2022
- Full Text
- View/download PDF
23. A wheat integrative regulatory network from large-scale complementary functional datasets enables trait-associated gene discovery for crop improvement.
- Author
-
Chen, Yongming, Guo, Yiwen, Guan, Panfeng, Wang, Yongfa, Wang, Xiaobo, Wang, Zihao, Qin, Zhen, Ma, Shengwei, Xin, Mingming, Hu, Zhaorong, Yao, Yingyin, Ni, Zhongfu, Sun, Qixin, Guo, Weilong, and Peng, Huiru
- Abstract
Gene regulation is central to all aspects of organism growth, and understanding it using large-scale functional datasets can provide a whole view of biological processes controlling complex phenotypic traits in crops. However, the connection between massive functional datasets and trait-associated gene discovery for crop improvement is still lacking. In this study, we constructed a wheat integrative gene regulatory network (wGRN) by combining an updated genome annotation and diverse complementary functional datasets, including gene expression, sequence motif, transcription factor (TF) binding, chromatin accessibility, and evolutionarily conserved regulation. wGRN contains 7.2 million genome-wide interactions covering 5947 TFs and 127 439 target genes, which were further verified using known regulatory relationships, condition-specific expression, gene functional information, and experiments. We used wGRN to assign genome-wide genes to 3891 specific biological pathways and accurately prioritize candidate genes associated with complex phenotypic traits in genome-wide association studies. In addition, wGRN was used to enhance the interpretation of a spike temporal transcriptome dataset to construct high-resolution networks. We further unveiled novel regulators that enhance the power of spike phenotypic trait prediction using machine learning and contribute to the spike phenotypic differences among modern wheat accessions. Finally, we developed an interactive webserver, wGRN (http://wheat.cau.edu.cn/wGRN), for the community to explore gene regulation and discover trait-associated genes. Collectively, this community resource establishes the foundation for using large-scale functional datasets to guide trait-associated gene discovery for crop improvement. This study constructs a wheat integrative gene regulatory network by combining diverse complementary functional datasets. An interactive platform, wGRN, that uses large-scale functional datasets to guide trait-associated gene discovery for crop improvement has been built. With wGRN, some novel regulators of wheat spike traits are unveiled. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
24. Towards a robust out-of-the-box neural network model for genomic data
- Author
-
Zhaoyi Zhang, Songyang Cheng, and Claudia Solis-Lemus
- Subjects
Generalization error ,Phenotype prediction ,Convolutional ,Natural language processing ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background The accurate prediction of biological features from genomic data is paramount for precision medicine and sustainable agriculture. For decades, neural network models have been widely popular in fields like computer vision, astrophysics and targeted marketing given their prediction accuracy and their robust performance under big data settings. Yet neural network models have not made a successful transition into the medical and biological world due to the ubiquitous characteristics of biological data such as modest sample sizes, sparsity, and extreme heterogeneity. Results Here, we investigate the robustness, generalization potential and prediction accuracy of widely used convolutional neural network and natural language processing models with a variety of heterogeneous genomic datasets. Mainly, recurrent neural network models outperform convolutional neural network models in terms of prediction accuracy, overfitting and transferability across the datasets under study. Conclusions While the perspective of a robust out-of-the-box neural network model is out of reach, we identify certain model characteristics that translate well across datasets and could serve as a baseline model for translational researchers.
- Published
- 2022
- Full Text
- View/download PDF
25. Predicting phenotypes of beef eating quality traits
- Author
-
Mehrnush Forutan, Andrew Lynn, Hassan Aliloo, Samuel A. Clark, Peter McGilchrist, Rod Polkinghorne, and Ben J. Hayes
- Subjects
BayesR ,beef cattle ,eating quality ,high-density SNP ,phenotype prediction ,Genetics ,QH426-470 - Abstract
Introduction: Phenotype predictions of beef eating quality for individual animals could be used to allocate animals to longer and more expensive feeding regimes as they enter the feedlot if they are predicted to have higher eating quality, and to sort carcasses into consumer or market value categories. Phenotype predictions can include genetic effects (breed effects, heterosis and breeding value), predicted from genetic markers, as well as fixed effects such as days aged and carcass weight, hump height, ossification, and hormone growth promotant (HGP) status.Methods: Here we assessed accuracy of phenotype predictions for five eating quality traits (tenderness, juiciness, flavour, overall liking and MQ4) in striploins from 1701 animals from a wide variety of backgrounds, including Bos indicus and Bos taurus breeds, using genotypes and simple fixed effects including days aged and carcass weight. The genetic components were predicted based on 709k single nucleotide polymorphism (SNP) using BayesR model, which assumes some markers may have a moderate to large effect. Fixed effects in the prediction included principal components of the genomic relationship matrix, to account for breed effects, heterosis, days aged and carcass weight.Results and Discussion: A model which allowed breed effects to be captured in the SNP effects (e.g., not explicitly fitting these effects) tended to have slightly higher accuracies (0.43–0.50) compared to when these effects were explicitly fitted as fixed effects (0.42–0.49), perhaps because breed effects when explicitly fitted were estimated with more error than when incorporated into the (random) SNP effects. Adding estimates of effects of days aged and carcass weight did not increase the accuracy of phenotype predictions in this particular analysis. The accuracy of phenotype prediction for beef eating quality traits was sufficiently high that such predictions could be useful in predicting eating quality from DNA samples taken from an animal/carcass as it enters the processing plant, to enable optimal supply chain value extraction by sorting product into markets with different quality. The BayesR predictions identified several novel genes potentially associated with beef eating quality.
- Published
- 2023
- Full Text
- View/download PDF
26. Machine learning applications for transcription level and phenotype predictions.
- Author
-
Chantaraamporn, Juthamard, Phumikhet, Pongpannee, Nguantad, Sarintip, Techo, Todsapol, and Charoensawan, Varodom
- Subjects
- *
MACHINE learning , *MOLECULAR biology , *GENETIC regulation , *SYNTHETIC biology , *MOLECULAR biologists , *GENE expression - Abstract
Predicting phenotypes and complex traits from genomic variations has always been a big challenge in molecular biology, at least in part because the task is often complicated by the influences of external stimuli and the environment on regulation of gene expression. With today's abundance of omic data and advances in high‐throughput computing and machine learning (ML), we now have an unprecedented opportunity to uncover the missing links and molecular mechanisms that control gene expression and phenotypes. To empower molecular biologists and researchers in related fields to start using ML for in‐depth analyses of their large‐scale data, here we provide a summary of fundamental concepts of machine learning, and describe a wide range of research questions and scenarios in molecular biology where ML has been implemented. Due to the abundance of data, reproducibility, and genome‐wide coverage, we focus on transcriptomics, and two ML tasks involving it: (a) predicting of transcriptomic profiles or transcription levels from genomic variations in DNA, and (b) predicting phenotypes of interest from transcriptomic profiles or transcription levels. Similar approaches can also be applied to more complex data such as those in multi‐omic studies. We envisage that the concepts and examples described here will raise awareness and promote the application of ML among molecular biologists, and eventually help improve a framework for systematic design and predictions of gene expression and phenotypes for synthetic biology applications. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
27. A comparison of classical and machine learning-based phenotype prediction methods on simulated data and three plant species.
- Author
-
John, Maura, Haselbeck, Florian, Dass, Rupashree, Malisi, Christoph, Ricca, Patrizia, Dreischer, Christian, Schultheiss, Sebastian J., and Grimm, Dominik G.
- Subjects
PLANT species ,PHENOTYPES ,ARABIDOPSIS thaliana ,PREDICTION models ,SIMPLE machines - Abstract
Genomic selection is an integral tool for breeders to accurately select plants directly from genotype data leading to faster and more resource-efficient breeding programs. Several prediction methods have been established in the last few years. These range from classical linear mixed models to complex non-linear machine learning approaches, such as Support Vector Regression, and modern deep learning-based architectures. Many of these methods have been extensively evaluated on different crop species with varying outcomes. In this work, our aim is to systematically compare 12 different phenotype prediction models, including basic genomic selection methods to more advanced deep learning-based techniques. More importantly, we assess the performance of these models on simulated phenotype data as well as on real-world data from Arabidopsis thaliana and two breeding datasets from soy and corn. The synthetic phenotypic data allow us to analyze all prediction models and especially the selected markers under controlled and predefined settings. We show that Bayes B and linear regression models with sparsity constraints perform best under different simulation settings with respect to explained variance. Further, we can confirm results from other studies that there is no superiority of more complex neural network-based architectures for phenotype prediction compared to well-established methods. However, on real-world data, for which several prediction models yield comparable results with slight advantages for Elastic Net, this picture is less clear, suggesting that there is a lot of room for future research. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
28. Addressing Noise and Estimating Uncertainty in Biomedical Data through the Exploration of Chemical Space.
- Author
-
deAndrés-Galiana, Enrique J., Fernández-Martínez, Juan Luis, Fernández-Brillet, Lucas, Cernea, Ana, and Kloczkowski, Andrzej
- Subjects
- *
DRUG discovery , *DRUG repositioning , *NOISE , *ARTIFICIAL intelligence , *PARALLEL programming - Abstract
Noise is a basic ingredient in data, since observed data are always contaminated by unwanted deviations, i.e., noise, which, in the case of overdetermined systems (with more data than model parameters), cause the corresponding linear system of equations to have an imperfect solution. In addition, in the case of highly underdetermined parameterization, noise can be absorbed by the model, generating spurious solutions. This is a very undesirable situation that might lead to incorrect conclusions. We presented mathematical formalism based on the inverse problem theory combined with artificial intelligence methodologies to perform an enhanced sampling of noisy biomedical data to improve the finding of meaningful solutions. Random sampling methods fail for high-dimensional biomedical problems. Sampling methods such as smart model parameterizations, forward surrogates, and parallel computing are better suited for such problems. We applied these methods to several important biomedical problems, such as phenotype prediction and a problem related to predicting the effects of protein mutations, i.e., if a given single residue mutation is neutral or deleterious, causing a disease. We also applied these methods to de novo drug discovery and drug repositioning (repurposing) through the enhanced exploration of huge chemical space. The purpose of these novel methods that address the problem of noise and uncertainty in biomedical data is to find new therapeutic solutions, perform drug repurposing, and accelerate and optimize drug discovery, thus reestablishing homeostasis. Finding the right target, the right compound, and the right patient are the three bottlenecks to running successful clinical trials from the correct analysis of preclinical models. Artificial intelligence can provide a solution to these problems, considering that the character of the data restricts the quality of the prediction, as in any modeling procedure in data analysis. The use of simple and plain methodologies is crucial to tackling these important and challenging problems, particularly drug repositioning/repurposing in rare diseases. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
29. Multi-view BLUP: a promising solution for post-omics data integrative prediction.
- Author
-
Wu B, Xiong H, Zhuo L, Xiao Y, Yan J, and Yang W
- Abstract
Phenotypic prediction is a promising strategy for accelerating plant breeding. Data from multiple sources (called multi-view data) can provide complementary information to characterize a biological object from various aspects. By integrating multi-view information into phenotypic prediction, a multi-view best linear unbiased prediction (MVBLUP) method was proposed in this paper. To measure the importance of multiple data views, the differential evolution algorithm with an early stopping mechanism was used, by which we obtained a multi-view kinship matrix and then incorporated it into the BLUP model for phenotypic prediction. To further illustrate the characteristics of MVBLUP, we performed the empirical experiments on four multi-view datasets in different crops. Compared to the single-view method, the prediction accuracy of the MVBLUP method has improved by 0.038 to 0.201 on average. The results demonstrate that the MVBLUP is an effective integrative prediction method for multi-view data., Competing Interests: Declaration of Competing Interest The authors declare that they have no competing interests., (Copyright © 2024. Published by Elsevier Ltd.)
- Published
- 2024
- Full Text
- View/download PDF
30. Neural network-based predictions of antimicrobial resistance phenotypes in multidrug-resistant Acinetobacter baumannii from whole genome sequencing and gene expression.
- Author
-
Jia H, Li X, Zhuang Y, Wu Y, Shi S, Sun Q, He F, Liang S, Wang J, Draz MS, Xie X, Zhang J, Yang Q, and Ruan Z
- Subjects
- Multilocus Sequence Typing, Phenotype, Humans, Genome, Bacterial genetics, Acinetobacter Infections microbiology, Acinetobacter Infections drug therapy, Genotype, Acinetobacter baumannii drug effects, Acinetobacter baumannii genetics, Drug Resistance, Multiple, Bacterial genetics, Microbial Sensitivity Tests, Anti-Bacterial Agents pharmacology, Whole Genome Sequencing, Phylogeny, Neural Networks, Computer
- Abstract
Whole genome sequencing (WGS) potentially represents a rapid approach for antimicrobial resistance genotype-to-phenotype prediction. However, the challenge still exists to predict fully minimum inhibitory concentrations (MICs) and antimicrobial susceptibility phenotypes based on WGS data. This study aimed to establish an artificial intelligence-based computational approach in predicting antimicrobial susceptibilities of multidrug-resistant Acinetobacter baumannii from WGS and gene expression data. Antimicrobial susceptibility testing (AST) was performed using the broth microdilution method for 10 antimicrobial agents. In silico multilocus sequence typing (MLST), antimicrobial resistance genes, and phylogeny based on cgSNP and cgMLST strategies were analyzed. High-throughput qPCR was performed to measure the expression level of antimicrobial resistance (AMR) genes. Most isolates exhibited a high level of resistance to most of the tested antimicrobial agents, with the majority belonging to the IC2/CC92 lineage. Phylogenetic analysis revealed undetected transmission events or local outbreaks. The percentage agreements between AMR phenotype and genotype ranged from 70.08% to 89.96%, with the coefficient of agreement (κ) extending from 0.025 and 0.881. The prediction of AST employed by deep neural network models achieved an accuracy of up to 98.64% on the testing data set. Additionally, several linear regression models demonstrated high prediction accuracy, reaching up to 86.15% within an error range of one gradient, indicating a linear relationship between certain gene expressions and the corresponding antimicrobial MICs. In conclusion, neural network-based predictions could be used as a tool for the surveillance of antimicrobial resistance in multidrug-resistant A. baumannii ., Competing Interests: The authors declare no conflict of interest.
- Published
- 2024
- Full Text
- View/download PDF
31. Investigation of gut microbiota disorders in norovirus infected children patients based on 16s rRNA sequencing.
- Author
-
Li J, Jiang N, Zheng H, Zheng X, Xu Y, Weng Y, Jiang F, Wang C, and Chang P
- Subjects
- Humans, Male, Female, Child, Preschool, Child, Infant, Case-Control Studies, Caliciviridae Infections microbiology, Caliciviridae Infections virology, Gastrointestinal Microbiome genetics, RNA, Ribosomal, 16S genetics, Norovirus genetics, Norovirus isolation & purification, Feces microbiology, Feces virology, Gastroenteritis microbiology, Gastroenteritis virology
- Abstract
Background: Norovirus is the leading cause of sporadic viral gastroenteritis cases and outbreaks. Gut microbiota plays a key role in maintaining immune homeostasis. We aimed to investigate the composition and functional effects of gut microbiota in children infected with norovirus., Methods: Stool samples were collected from 31 children infected with norovirus and 25 healthy children. The gut microbiota was analyzed by 16S rRNA gene sequencing, followed by composition, correlation network, functional and phenotype prediction analyses., Results: Gut microbiota in children infected with norovirus was characterized by lower species richness and diversity. Veillonella is the dominant gut microbiota specie in norovirus infection. Blautia was significantly lower in norovirus infection. There was a positive correlation between Faecalibacterium , Blautia , Subdoligranulum , Eubacterium_hallii_group , Fusicatenibacter , Agathobacter , Roseburia and Dorea . Functionally, secondary metabolites biosynthesis, transport and catabolism, selenocysteine lyase and peroxiredoxin were the most significantly higher functional compositions of gut microbiota in norovirus infection. However, sn-glycerol-1-phosphate dehydrogenase and fermentation were the most significantly lower functional compositions in norovirus infection group. Phenotype analysis showed that Contains_Mobile_Elements had the highest level of phenotypes in the gut microbiota of norovirus infection., Conclusion: Norovirus infection may lead to dysregulation of the gut microbiome in children.
- Published
- 2024
- Full Text
- View/download PDF
32. The Utilization of Different Classifiers to Perform Drug Repositioning in Inclusion Body Myositis Supports the Concept of Biological Invariance
- Author
-
Álvarez-Machancoses, Óscar, deAndrés-Galiana, Enrique, Fernández-Martínez, Juan Luis, Kloczkowski, Andrzej, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Rutkowski, Leszek, editor, Scherer, Rafał, editor, Korytkowski, Marcin, editor, Pedrycz, Witold, editor, Tadeusiewicz, Ryszard, editor, and Zurada, Jacek M., editor
- Published
- 2020
- Full Text
- View/download PDF
33. A Topological Data Analysis Approach on Predicting Phenotypes from Gene Expression Data
- Author
-
Mandal, Sayan, Guzmán-Sáenz, Aldo, Haiminen, Niina, Basu, Saugata, Parida, Laxmi, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Martín-Vide, Carlos, editor, Vega-Rodríguez, Miguel A., editor, and Wheeler, Travis, editor
- Published
- 2020
- Full Text
- View/download PDF
34. External visible characteristics prediction through SNPs analysis in the forensic setting: a review
- Author
-
Pamela Tozzo, Caterina Politi, Arianna Delicati, Andrea Gabbin, and Luciana Caenazzo
- Subjects
external visible characteristics ,forensic dna phenotyping ,snps analysis ,phenotype prediction ,review ,Biochemistry ,QD415-436 ,Biology (General) ,QH301-705.5 - Abstract
Numerous major advances have been made in forensic genetics over the past decade. One recent field of research has been focused on the analysis of External Visible Characteristics (EVC) such as eye colour, hair colour (including hair greying), hair morphology, skin colour, freckles, facial morphology, high myopia, obesity, and adult height, with important repercussions in the forensic field. Its use could be especially useful in investigative cases where there are no potential suspects and no match between the evidence DNA sample under investigation and any genetic profiles entered into criminal databases. The present review represents the current state of knowledge of SNPs (Single Nucleotide Polymorphisms) regarding visible characteristics, including the latest research progress in identifying new genetic markers, their most promising applications in the forensic field and the implications for police investigations. The applicability of these techniques to concrete cases has stoked a heated debate in the literature on the ethical implications of using these predictive tools for visible traits.
- Published
- 2021
- Full Text
- View/download PDF
35. A comparison of classical and machine learning-based phenotype prediction methods on simulated data and three plant species
- Author
-
Maura John, Florian Haselbeck, Rupashree Dass, Christoph Malisi, Patrizia Ricca, Christian Dreischer, Sebastian J. Schultheiss, and Dominik G. Grimm
- Subjects
phenotype prediction ,genomic selection ,plant phenotyping ,machine learning ,Arabidopsis thaliana ,Plant culture ,SB1-1110 - Abstract
Genomic selection is an integral tool for breeders to accurately select plants directly from genotype data leading to faster and more resource-efficient breeding programs. Several prediction methods have been established in the last few years. These range from classical linear mixed models to complex non-linear machine learning approaches, such as Support Vector Regression, and modern deep learning-based architectures. Many of these methods have been extensively evaluated on different crop species with varying outcomes. In this work, our aim is to systematically compare 12 different phenotype prediction models, including basic genomic selection methods to more advanced deep learning-based techniques. More importantly, we assess the performance of these models on simulated phenotype data as well as on real-world data from Arabidopsis thaliana and two breeding datasets from soy and corn. The synthetic phenotypic data allow us to analyze all prediction models and especially the selected markers under controlled and predefined settings. We show that Bayes B and linear regression models with sparsity constraints perform best under different simulation settings with respect to explained variance. Further, we can confirm results from other studies that there is no superiority of more complex neural network-based architectures for phenotype prediction compared to well-established methods. However, on real-world data, for which several prediction models yield comparable results with slight advantages for Elastic Net, this picture is less clear, suggesting that there is a lot of room for future research.
- Published
- 2022
- Full Text
- View/download PDF
36. 16S rDNA Sequencing-Based Insights into the Bacterial Community Structure and Function in Co-Existing Soil and Coal Gangue
- Author
-
Mengying Ruan, Zhenqi Hu, Qi Zhu, Yuanyuan Li, and Xinran Nie
- Subjects
coal gangue ,co-occurrence network ,microbial community ,bacterial functional potential prediction ,phenotype prediction ,Biology (General) ,QH301-705.5 - Abstract
Coal gangue is a solid waste emitted during coal production. Coal gangue is deployed adjacent to mining land and has characteristics similar to those of the soils of these areas. Coal gangue–soil ecosystems provide habitats for a rich and active bacterial community. However, co-existence networks and the functionality of soil and coal gangue bacterial communities have not been studied. Here, we performed Illumina MiSeq high-throughput sequencing, symbiotic network and statistical analyses, and microbial phenotype prediction to study the microbial community in coal gangue and soil samples from Shanxi Province, China. In general, the structural difference between the bacterial communities in coal gangue and soil was large, indicating that interactions between soil and coal gangue are limited but not absent. The bacterial community exhibited a significant symbiosis network in soil and coal gangue. The co-occurrence network was primarily formed by Proteobacteria, Firmicutes, and Actinobacteria. In addition, BugBase microbiome phenotype predictions and PICRUSt bacterial functional potential predictions showed that transcription regulators represented the highest functional category of symbiotic bacteria in soil and coal gangue. Proteobacteria played an important role in various processes such as mobile element pathogenicity, oxidative stress tolerance, and biofilm formation. In general, this work provides a theoretical basis and data support for the in situ remediation of acidified coal gangue hills based on microbiological methods.
- Published
- 2023
- Full Text
- View/download PDF
37. Genome-scale metabolic network models: from first-generation to next-generation.
- Author
-
Ye, Chao, Wei, Xinyu, Shi, Tianqiong, Sun, Xiaoman, Xu, Nan, Gao, Cong, and Zou, Wei
- Subjects
- *
METABOLIC models , *DRUG target , *BIOLOGICAL models , *MACHINE learning , *PHENOTYPES , *BIOTECHNOLOGY - Abstract
Over the last two decades, thousands of genome-scale metabolic network models (GSMMs) have been constructed. These GSMMs have been widely applied in various fields, ranging from network interaction analysis, to cell phenotype prediction. However, due to the lack of constraints, the prediction accuracy of first-generation GSMMs was limited. To overcome these limitations, the next-generation GSMMs were developed by integrating omics data, adding constrain condition, integrating different biological models, and constructing whole-cell models. Here, we review recent advances of GSMMs from the first generation to the next generation. Then, we discuss the major application of GSMMs in industrial biotechnology, such as predicting phenotypes and guiding metabolic engineering. In addition, human health applications, including understanding biological mechanisms, discovering biomarkers and drug targets, are also summarized. Finally, we address the challenges and propose new trend of GSMMs. Key points: •This mini-review updates the literature on almost all published GSMMs since 1999. •Detailed insights into the development of the first- and next-generation GSMMs. •The application of GSMMs is summarized, and the prospects of integrating machine learning are emphasized. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
38. Assessment of deep learning and transfer learning for cancer prediction based on gene expression data.
- Author
-
Hanczar, Blaise, Bourgeais, Victoria, and Zehraoui, Farida
- Subjects
- *
DEEP learning , *ARTIFICIAL neural networks , *GENE expression , *TRANSFER of training , *MACHINE learning - Abstract
Background: Machine learning is now a standard tool for cancer prediction based on gene expression data. However, deep learning is still new for this task, and there is no clear consensus about its performance and utility. Few experimental works have evaluated deep neural networks and compared them with state-of-the-art machine learning. Moreover, their conclusions are not consistent. Results: We extensively evaluate the deep learning approach on 22 cancer prediction tasks based on gene expression data. We measure the impact of the main hyper-parameters and compare the performances of neural networks with the state-of-the-art. We also investigate the effectiveness of several transfer learning schemes in different experimental setups. Conclusion: Based on our experimentations, we provide several recommendations to optimize the construction and training of a neural network model. We show that neural networks outperform the state-of-the-art methods only for very large training set size. For a small training set, we show that transfer learning is possible and may strongly improve the model performance in some cases. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
39. Deep GONet: self-explainable deep neural network based on Gene Ontology for phenotype prediction from gene expression data
- Author
-
Victoria Bourgeais, Farida Zehraoui, Mohamed Ben Hamdoune, and Blaise Hanczar
- Subjects
Gene expression ,Phenotype prediction ,Model interpretation ,Deep learning ,Gene Ontology ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background With the rapid advancement of genomic sequencing techniques, massive production of gene expression data is becoming possible, which prompts the development of precision medicine. Deep learning is a promising approach for phenotype prediction (clinical diagnosis, prognosis, and drug response) based on gene expression profile. Existing deep learning models are usually considered as black-boxes that provide accurate predictions but are not interpretable. However, accuracy and interpretation are both essential for precision medicine. In addition, most models do not integrate the knowledge of the domain. Hence, making deep learning models interpretable for medical applications using prior biological knowledge is the main focus of this paper. Results In this paper, we propose a new self-explainable deep learning model, called Deep GONet, integrating the Gene Ontology into the hierarchical architecture of the neural network. This model is based on a fully-connected architecture constrained by the Gene Ontology annotations, such that each neuron represents a biological function. The experiments on cancer diagnosis datasets demonstrate that Deep GONet is both easily interpretable and highly performant to discriminate cancer and non-cancer samples. Conclusions Our model provides an explanation to its predictions by identifying the most important neurons and associating them with biological functions, making the model understandable for biologists and physicians.
- Published
- 2021
- Full Text
- View/download PDF
40. Lessons from the CAGI‐4 Hopkins clinical panel challenge
- Author
-
Chandonia, John‐Marc, Adhikari, Aashish, Carraro, Marco, Chhibber, Aparna, Cutting, Garry R, Fu, Yao, Gasparini, Alessandra, Jones, David T, Kramer, Andreas, Kundu, Kunal, Lam, Hugo YK, Leonardi, Emanuela, Moult, John, Pal, Lipika R, Searls, David B, Shah, Sohela, Sunyaev, Shamil, Tosatto, Silvio CE, Yin, Yizhou, and Buckley, Bethany A
- Subjects
Biological Sciences ,Biomedical and Clinical Sciences ,Clinical Sciences ,Genetics ,Clinical Research ,4.2 Evaluation of markers and technologies ,Detection ,screening and diagnosis ,Computational Biology ,Databases ,Genetic ,Genetic Predisposition to Disease ,Genetic Testing ,Humans ,Phenotype ,Sequence Analysis ,DNA ,CAGI ,genetic testing ,phenotype prediction ,variant interpretation ,Genetics & Heredity ,Clinical sciences - Abstract
The CAGI-4 Hopkins clinical panel challenge was an attempt to assess state-of-the-art methods for clinical phenotype prediction from DNA sequence. Participants were provided with exonic sequences of 83 genes for 106 patients from the Johns Hopkins DNA Diagnostic Laboratory. Five groups participated in the challenge, predicting both the probability that each patient had each of the 14 possible classes of disease, as well as one or more causal variants. In cases where the Hopkins laboratory reported a variant, at least one predictor correctly identified the disease class in 36 of the 43 patients (84%). Even in cases where the Hopkins laboratory did not find a variant, at least one predictor correctly identified the class in 39 of the 63 patients (62%). Each prediction group correctly diagnosed at least one patient that was not successfully diagnosed by any other group. We discuss the causal variant predictions by different groups and their implications for further development of methods to assess variants of unknown significance. Our results suggest that clinically relevant variants may be missed when physicians order small panels targeted on a specific phenotype. We also quantify the false-positive rate of DNA-guided analysis in the absence of prior phenotypic indication.
- Published
- 2017
41. Plant Genotype to Phenotype Prediction Using Machine Learning.
- Author
-
Danilevicz, Monica F., Gill, Mitchell, Anderson, Robyn, Batley, Jacqueline, Bennamoun, Mohammed, Bayer, Philipp E., and Edwards, David
- Subjects
MACHINE learning ,METADATA ,PLANT breeding ,GENOTYPES ,PHENOTYPES ,DRONE aircraft - Abstract
Genomic prediction tools support crop breeding based on statistical methods, such as the genomic best linear unbiased prediction (GBLUP). However, these tools are not designed to capture non-linear relationships within multi-dimensional datasets, or deal with high dimension datasets such as imagery collected by unmanned aerial vehicles. Machine learning (ML) algorithms have the potential to surpass the prediction accuracy of current tools used for genotype to phenotype prediction, due to their capacity to autonomously extract data features and represent their relationships at multiple levels of abstraction. This review addresses the challenges of applying statistical and machine learning methods for predicting phenotypic traits based on genetic markers, environment data, and imagery for crop breeding. We present the advantages and disadvantages of explainable model structures, discuss the potential of machine learning models for genotype to phenotype prediction in crop breeding, and the challenges, including the scarcity of high-quality datasets, inconsistent metadata annotation and the requirements of ML models. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
42. Drug resistance prediction and resistance genes identification in Mycobacterium tuberculosis based on a hierarchical attentive neural network utilizing genome-wide variants.
- Author
-
Jiang, Zhonghua, Lu, Yongmei, Liu, Zhuochong, Wu, Wei, Xu, Xinyi, Dinnyés, András, Yu, Zhonghua, Chen, Li, and Sun, Qun
- Subjects
- *
MYCOBACTERIUM tuberculosis , *DRUG resistance , *ARTIFICIAL neural networks , *NATURAL language processing , *GENETIC variation - Abstract
Prediction of antimicrobial resistance based on whole-genome sequencing data has attracted greater attention due to its rapidity and convenience. Numerous machine learning–based studies have used genetic variants to predict drug resistance in Mycobacterium tuberculosis (MTB), assuming that variants are homogeneous, and most of these studies, however, have ignored the essential correlation between variants and corresponding genes when encoding variants, and used a limited number of variants as prediction input. In this study, taking advantage of genome-wide variants for drug-resistance prediction and inspired by natural language processing, we summarize drug resistance prediction into document classification, in which variants are considered as words, mutated genes in an isolate as sentences, and an isolate as a document. We propose a novel hierarchical attentive neural network model (HANN) that helps discover drug resistance-related genes and variants and acquire more interpretable biological results. It captures the interaction among variants in a mutated gene as well as among mutated genes in an isolate. Our results show that for the four first-line drugs of isoniazid (INH), rifampicin (RIF), ethambutol (EMB) and pyrazinamide (PZA), the HANN achieves the optimal area under the ROC curve of 97.90, 99.05, 96.44 and 95.14% and the optimal sensitivity of 94.63, 96.31, 92.56 and 87.05%, respectively. In addition, without any domain knowledge, the model identifies drug resistance-related genes and variants consistent with those confirmed by previous studies, and more importantly, it discovers one more potential drug-resistance-related gene. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
43. MegaD: Deep Learning for Rapid and Accurate Disease Status Prediction of Metagenomic Samples.
- Author
-
Mreyoud, Yassin, Song, Myoungkyu, Lim, Jihun, and Ahn, Tae-Hyuk
- Subjects
- *
DEEP learning , *METAGENOMICS , *MACHINE learning , *FORECASTING , *PHENOTYPES - Abstract
The diversity within different microbiome communities that drive biogeochemical processes influences many different phenotypes. Analyses of these communities and their diversity by countless microbiome projects have revealed an important role of metagenomics in understanding the complex relation between microbes and their environments. This relationship can be understood in the context of microbiome composition of specific known environments. These compositions can then be used as a template for predicting the status of similar environments. Machine learning has been applied as a key component to this predictive task. Several analysis tools have already been published utilizing machine learning methods for metagenomic analysis. Despite the previously proposed machine learning models, the performance of deep neural networks is still under-researched. Given the nature of metagenomic data, deep neural networks could provide a strong boost to growth in the prediction accuracy in metagenomic analysis applications. To meet this urgent demand, we present a deep learning based tool that utilizes a deep neural network implementation for phenotypic prediction of unknown metagenomic samples. (1) First, our tool takes as input taxonomic profiles from 16S or WGS sequencing data. (2) Second, given the samples, our tool builds a model based on a deep neural network by computing multi-level classification. (3) Lastly, given the model, our tool classifies an unknown sample with its unlabeled taxonomic profile. In the benchmark experiments, we deduced that an analysis method facilitating a deep neural network such as our tool can show promising results in increasing the prediction accuracy on several samples compared to other machine learning models. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
44. Towards a robust out-of-the-box neural network model for genomic data.
- Author
-
Zhang, Zhaoyi, Cheng, Songyang, and Solis-Lemus, Claudia
- Subjects
- *
ARTIFICIAL neural networks , *RECURRENT neural networks , *CONVOLUTIONAL neural networks , *COMPUTER vision , *NATURAL language processing , *BIG data - Abstract
Background: The accurate prediction of biological features from genomic data is paramount for precision medicine and sustainable agriculture. For decades, neural network models have been widely popular in fields like computer vision, astrophysics and targeted marketing given their prediction accuracy and their robust performance under big data settings. Yet neural network models have not made a successful transition into the medical and biological world due to the ubiquitous characteristics of biological data such as modest sample sizes, sparsity, and extreme heterogeneity. Results: Here, we investigate the robustness, generalization potential and prediction accuracy of widely used convolutional neural network and natural language processing models with a variety of heterogeneous genomic datasets. Mainly, recurrent neural network models outperform convolutional neural network models in terms of prediction accuracy, overfitting and transferability across the datasets under study. Conclusions: While the perspective of a robust out-of-the-box neural network model is out of reach, we identify certain model characteristics that translate well across datasets and could serve as a baseline model for translational researchers. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
45. MegaR: an interactive R package for rapid sample classification and phenotype prediction using metagenome profiles and machine learning
- Author
-
Eliza Dhungel, Yassin Mreyoud, Ho-Jin Gwak, Ahmad Rajeh, Mina Rho, and Tae-Hyuk Ahn
- Subjects
Metagenomics ,Machine learning ,R-package ,Phenotype prediction ,Sample classification ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background Diverse microbiome communities drive biogeochemical processes and evolution of animals in their ecosystems. Many microbiome projects have demonstrated the power of using metagenomics to understand the structures and factors influencing the function of the microbiomes in their environments. In order to characterize the effects from microbiome composition for human health, diseases, and even ecosystems, one must first understand the relationship of microbes and their environment in different samples. Running machine learning model with metagenomic sequencing data is encouraged for this purpose, but it is not an easy task to make an appropriate machine learning model for all diverse metagenomic datasets. Results We introduce MegaR, an R Shiny package and web application, to build an unbiased machine learning model effortlessly with interactive visual analysis. The MegaR employs taxonomic profiles from either whole metagenome sequencing or 16S rRNA sequencing data to develop machine learning models and classify the samples into two or more categories. It provides various options for model fine tuning throughout the analysis pipeline such as data processing, multiple machine learning techniques, model validation, and unknown sample prediction that can be used to achieve the highest prediction accuracy possible for any given dataset while still maintaining a user-friendly experience. Conclusions Metagenomic sample classification and phenotype prediction is important particularly when it applies to a diagnostic method for identifying and predicting microbe-related human diseases. MegaR provides various interactive visualizations for user to build an accurate machine-learning model without difficulty. Unknown sample prediction with a properly trained model using MegaR will enhance researchers to identify the sample property in a fast turnaround time.
- Published
- 2021
- Full Text
- View/download PDF
46. Plant Genotype to Phenotype Prediction Using Machine Learning
- Author
-
Monica F. Danilevicz, Mitchell Gill, Robyn Anderson, Jacqueline Batley, Mohammed Bennamoun, Philipp E. Bayer, and David Edwards
- Subjects
machine learning ,plant phenotyping ,phenotype prediction ,plant breeding ,big data ,Genetics ,QH426-470 - Abstract
Genomic prediction tools support crop breeding based on statistical methods, such as the genomic best linear unbiased prediction (GBLUP). However, these tools are not designed to capture non-linear relationships within multi-dimensional datasets, or deal with high dimension datasets such as imagery collected by unmanned aerial vehicles. Machine learning (ML) algorithms have the potential to surpass the prediction accuracy of current tools used for genotype to phenotype prediction, due to their capacity to autonomously extract data features and represent their relationships at multiple levels of abstraction. This review addresses the challenges of applying statistical and machine learning methods for predicting phenotypic traits based on genetic markers, environment data, and imagery for crop breeding. We present the advantages and disadvantages of explainable model structures, discuss the potential of machine learning models for genotype to phenotype prediction in crop breeding, and the challenges, including the scarcity of high-quality datasets, inconsistent metadata annotation and the requirements of ML models.
- Published
- 2022
- Full Text
- View/download PDF
47. Biological interpretation of deep neural network for phenotype prediction based on gene expression
- Author
-
Blaise Hanczar, Farida Zehraoui, Tina Issa, and Mathieu Arles
- Subjects
Deep neural network ,Biological interpretation ,Phenotype prediction ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background The use of predictive gene signatures to assist clinical decision is becoming more and more important. Deep learning has a huge potential in the prediction of phenotype from gene expression profiles. However, neural networks are viewed as black boxes, where accurate predictions are provided without any explanation. The requirements for these models to become interpretable are increasing, especially in the medical field. Results We focus on explaining the predictions of a deep neural network model built from gene expression data. The most important neurons and genes influencing the predictions are identified and linked to biological knowledge. Our experiments on cancer prediction show that: (1) deep learning approach outperforms classical machine learning methods on large training sets; (2) our approach produces interpretations more coherent with biology than the state-of-the-art based approaches; (3) we can provide a comprehensive explanation of the predictions for biologists and physicians. Conclusion We propose an original approach for biological interpretation of deep learning models for phenotype prediction from gene expression data. Since the model can find relationships between the phenotype and gene expression, we may assume that there is a link between the identified genes and the phenotype. The interpretation can, therefore, lead to new biological hypotheses to be investigated by biologists.
- Published
- 2020
- Full Text
- View/download PDF
48. Using deep mutational scanning to benchmark variant effect predictors and identify disease mutations
- Author
-
Benjamin J Livesey and Joseph A Marsh
- Subjects
missense mutations ,phenotype prediction ,protein structure ,saturation mutagenesis ,variant effect ,Biology (General) ,QH301-705.5 ,Medicine (General) ,R5-920 - Abstract
Abstract To deal with the huge number of novel protein‐coding variants identified by genome and exome sequencing studies, many computational variant effect predictors (VEPs) have been developed. Such predictors are often trained and evaluated using different variant data sets, making a direct comparison between VEPs difficult. In this study, we use 31 previously published deep mutational scanning (DMS) experiments, which provide quantitative, independent phenotypic measurements for large numbers of single amino acid substitutions, in order to benchmark and compare 46 different VEPs. We also evaluate the ability of DMS measurements and VEPs to discriminate between pathogenic and benign missense variants. We find that DMS experiments tend to be superior to the top‐ranking predictors, demonstrating the tremendous potential of DMS for identifying novel human disease mutations. Among the VEPs, DeepSequence clearly stood out, showing both the strongest correlations with DMS data and having the best ability to predict pathogenic mutations, which is especially remarkable given that it is an unsupervised method. We further recommend SNAP2, DEOGEN2, SNPs&GO, SuSPect and REVEL based upon their performance in these analyses.
- Published
- 2020
- Full Text
- View/download PDF
49. G2PDeep-v2: a web-based deep-learning framework for phenotype prediction and biomarker discovery using multi-omics data.
- Author
-
Zeng S, Adusumilli T, Awan SZ, Immadi MS, Xu D, and Joshi T
- Abstract
The G2PDeep-v2 server is a web-based platform powered by deep learning, for phenotype prediction and markers discovery from multi-omics data in any organisms including humans, plants, animals, and viruses. The server provides multiple services for researchers to create deep-learning models through an interactive interface and train these models using an automated hyperparameter tuning algorithm on high-performance computing resources. Users can visualize the results of phenotype and markers predictions and perform Gene Set Enrichment Analysis for the significant markers to provide insights into the molecular mechanisms underlying complex diseases and other biological processes. The G2PDeep-v2 server is publicly available at https://g2pdeep.org/.
- Published
- 2024
- Full Text
- View/download PDF
50. Study on the difference of gut microbiota in DLY and Diqing Tibetan pigs induce by high fiber diet.
- Author
-
Yang L, Yao B, Zhang S, Yang Y, Pan H, Zeng X, and Qiao S
- Abstract
In order to investigate the regularity of fecal microorganisms changes in Landrace × Large White × Duroc (DLY) and Diqing Tibetan pigs (TP) induced by dietary fiber, and further explore the buffering effect of different intestinal flora structures on dietary stress. DLY (n = 15) and TP (n = 15) were divided into two treatments. Then, diet with 20% neutral detergent fiber (NDF) was supplemented for 9 days. Our results showed that the feed conversion efficiency of TP was significantly higher (p < 0.05) than that of DLY. The fecal microorganisms shared by the two groups gradually increased with the feeding cycle. In addition, the dispersion of Shannon, Simpson, ACE and Chao of TP decreased. Also, we found that the fecal microorganisms of TP (R
2 = 0.2089, p < 0.01) and DLY (R2 = 0.3982, p < 0.01) showed significant differences in different feeding cycles. With the prolongation of feeding cycle, the similarity of fecal microbial composition between DLY and TP increased. Our study strongly suggests that the complex environment and diet structure have shaped the unique gut microbiota of TP, which plays a vital role in the buffering effect of high-fiber diets., (© 2024 Wiley‐VCH GmbH. Published by John Wiley & Sons Ltd.)- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.