Back to Search Start Over

Accelerating the identification of the allergenic potential of plant proteins using a stacked ensemble-learning framework.

Authors :
Charoenkwan P
Chumnanpuen P
Schaduangrat N
Shoombuatong W
Source :
Journal of biomolecular structure & dynamics [J Biomol Struct Dyn] 2024 Feb 22, pp. 1-13. Date of Electronic Publication: 2024 Feb 22.
Publication Year :
2024
Publisher :
Ahead of Print

Abstract

Plant-allergenic proteins (PAPs) have the potential to induce allergic reactions in certain individuals. While these proteins are generally innocuous for the majority of people, they can elicit an immune response in those with particular sensitivities. Thus, screening and prioritizing the allergenic potential of plant proteins is indispensable for the development of diagnostic tools, therapeutic interventions or medications to treat allergic reactions. However, investigating the allergenic potential of plant proteins based on experimental methods is costly and labour-intensive. Therefore, we develop StackPAP, a three-layer stacking ensemble framework for accurate large-scale identification of PAPs. In StackPAP, at the first layer, we conducted a comprehensive analysis of an extensive set of feature descriptors. Subsequently, we selected and fused five potential sequence-based feature descriptors, including amphiphilic pseudo-amino acid composition, dipeptide deviation from expected mean, amino acid composition, pseudo amino acid composition and dipeptide composition. Additionally, we applied an efficient genetic algorithm (GA-SAR) to determine informative feature sets. In the second layer, 12 powerful machine learning (ML) methods, in combination with all the informative feature sets, were employed to construct a pool of base classifiers. Finally, 13 potential base classifiers were selected using the GA-SAR method and combined to develop the final meta-classifier. Our experimental results revealed the promising prediction performance of StackPAP, with an accuracy, Matthew's correlation coefficient and AUC of 0.984, 0.969 and 0.993, respectively, as judged by the independent test dataset. In conclusion, both cross-validation and independent test results indicated the superior performance of StackPAP compared with several ML-based classifiers. To accelerate the identification of the allergenicity of plant proteins, we developed a user-friendly web server for StackPAP (https://pmlabqsar.pythonanywhere.com/StackPAP). We anticipate that StackPAP will be an efficient and useful tool for rapidly screening PAPs from a vast number of plant proteins.Communicated by Ramaswamy H. Sarma.

Details

Language :
English
ISSN :
1538-0254
Database :
MEDLINE
Journal :
Journal of biomolecular structure & dynamics
Publication Type :
Academic Journal
Accession number :
38385478
Full Text :
https://doi.org/10.1080/07391102.2024.2318482