72 results on '"Ou YY"'
Search Results
2. VesiMCNN: Using pre-trained protein language models and multiple window scanning convolutional neural networks to identify vesicular transport proteins.
- Author
-
Le VT, Tseng YH, Liu YC, Malik MS, and Ou YY
- Abstract
Vesicular transport is a critical cellular process responsible for the proper organization and functioning of eukaryotic cells. This mechanism relies on specialized vesicles that shuttle macromolecules, such as proteins, across the cellular landscape, a process pivotal to maintaining cellular homeostasis. Disruptions in vesicular transport have been linked to various disease mechanisms, including cancer and neurodegenerative disorders. In this study, we present vesiMCNN, a novel computational approach that integrates pre-trained protein language models with a multi-window scanning convolutional neural network architecture to accurately identify vesicular transport proteins. To the best of our knowledge, this is the first study to leverage the power of pre-trained language models in combination with the multi-window scanning technique for this task. Our method achieved a Matthews Correlation Coefficient (MCC) of 0.558 and an Area Under the Receiver Operating Characteristic (AUC-ROC) of 0.933, outperforming existing state-of-the-art approaches. Additionally, we have curated a comprehensive benchmark dataset for the study of vesicular transport proteins, which can facilitate further research in this field. The remarkable performance of our model, combined with the comprehensive dataset and novel deep learning model, marks a significant advancement in the field of vesicular transport protein research., Competing Interests: Declaration of competing interest I, Le Van The, hereby declare that I have no financial interests or relationships with any organizations that could potentially influence the subject matter of this work. I also confirm that I do not hold any professional or personal affiliations that may be perceived as affecting the impartiality and objectivity of my research. I have received no funding, grants, or honoraria related to the research presented in this work. Additionally, I have no personal relationships or collaborations that might pose a conflict of interest. This work is conducted with complete transparency, and I am committed to upholding the highest standards of integrity in my scholarly contributions., (Copyright © 2024 Elsevier B.V. All rights reserved.)
- Published
- 2024
- Full Text
- View/download PDF
3. Deciphering the Language of Protein-DNA Interactions: A Deep Learning Approach Combining Contextual Embeddings and Multi-Scale Sequence Modeling.
- Author
-
Liu YC, Lin YJ, Chang YY, Chuang CC, and Ou YY
- Abstract
Deciphering the mechanisms governing protein-DNA interactions is crucial for understanding key cellular processes and disease pathways. In this work, we present a powerful deep learning approach that significantly advances the computational prediction of DNA-interacting residues from protein sequences. Our method leverages the rich contextual representations learned by pre-trained protein language models, such as ProtTrans, to capture intrinsic biochemical properties and sequence motifs indicative of DNA binding sites. We then integrate these contextual embeddings with a multi-window convolutional neural network architecture, which scans across the sequence at varying window sizes to effectively identify both local and global binding patterns. Comprehensive evaluation on curated benchmark datasets demonstrates the remarkable performance of our approach, achieving an area under the ROC curve (AUC) of 0.89 - a substantial improvement over previous state-of-the-art sequence-based predictors. This showcases the immense potential of pairing advanced representation learning and deep neural network designs for uncovering the complex syntax governing protein-DNA interactions directly from primary sequences. Our work not only provides a robust computational tool for characterizing DNA-binding mechanisms, but also highlights the transformative opportunities at the intersection of language modeling, deep learning, and protein sequence analysis. The publicly available code and data further facilitate broader adoption and continued development of these techniques for accelerating mechanistic insights into vital biological processes and disease pathways. In addition, the code and data for this work are available at https://github.com/B1607/DIRP., Competing Interests: Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper., (Copyright © 2024 Elsevier Ltd. All rights reserved.)
- Published
- 2024
- Full Text
- View/download PDF
4. MCNN_MC: Computational Prediction of Mitochondrial Carriers and Investigation of Bongkrekic Acid Toxicity Using Protein Language Models and Convolutional Neural Networks.
- Author
-
Malik MS, Chang YY, Liu YC, Le VT, and Ou YY
- Abstract
Mitochondrial carriers (MCs) are essential proteins that transport metabolites across mitochondrial membranes and play a critical role in cellular metabolism. ADP/ATP (adenosine diphosphate/adenosine triphosphate) is one of the most important carriers as it contributes to cellular energy production and is susceptible to the powerful toxin bongkrekic acid. This toxin has claimed several lives; for example, a recent foodborne outbreak in Taipei, Taiwan, has caused four deaths and sickened 30 people. The issue of bongkrekic acid poisoning has been a long-standing problem in Indonesia, with reports as early as 1895 detailing numerous deaths from contaminated coconut fermented cakes. In bioinformatics, significant advances have been made in understanding biological processes through computational methods; however, no established computational method has been developed for identifying mitochondrial carriers. We propose a computational bioinformatics approach for predicting MCs from a broader class of secondary active transporters with a focus on the ADP/ATP carrier and its interaction with bongkrekic acid. The proposed model combines protein language models (PLMs) with multiwindow scanning convolutional neural networks (mCNNs). While PLM embeddings capture contextual information within proteins, mCNN scans multiple windows to identify potential binding sites and extract local features. Our results show 96.66% sensitivity, 95.76% specificity, 96.12% accuracy, 91.83% Matthews correlation coefficient (MCC), 94.63% F1-Score, and 98.55% area under the curve (AUC). The results demonstrate the effectiveness of the proposed approach in predicting MCs and elucidating their functions, particularly in the context of bongkrekic acid toxicity. This study presents a valuable approach for identifying novel mitochondrial complexes, characterizing their functional roles, and understanding mitochondrial toxicology mechanisms. Our findings, that utilize computational methods to improve our understanding of cellular processes and drug-target interactions, contribute to the development of therapeutic strategies for mitochondrial disorders, reducing the devastating effects of bongkrekic acid poisoning.
- Published
- 2024
- Full Text
- View/download PDF
5. ProtTrans and multi-window scanning convolutional neural networks for the prediction of protein-peptide interaction sites.
- Author
-
Le VT, Zhan ZJ, Vu TT, Malik MS, and Ou YY
- Subjects
- Humans, Machine Learning, Protein Binding, Binding Sites, Algorithms, Databases, Protein, Neural Networks, Computer, Peptides chemistry, Proteins chemistry, Computational Biology methods
- Abstract
This study delves into the prediction of protein-peptide interactions using advanced machine learning techniques, comparing models such as sequence-based, standard CNNs, and traditional classifiers. Leveraging pre-trained language models and multi-view window scanning CNNs, our approach yields significant improvements, with ProtTrans standing out based on 2.1 billion protein sequences and 393 billion amino acids. The integrated model demonstrates remarkable performance, achieving an AUC of 0.856 and 0.823 on the PepBCL Set_1 and Set_2 datasets, respectively. Additionally, it attains a Precision of 0.564 in PepBCL Set 1 and 0.527 in PepBCL Set 2, surpassing the performance of previous methods. Beyond this, we explore the application of this model in cancer therapy, particularly in identifying peptide interactions for selective targeting of cancer cells, and other fields. The findings of this study contribute to bioinformatics, providing valuable insights for drug discovery and therapeutic development., Competing Interests: Declaration of competing interest I, Van The Le, hereby declare that I have no financial interests or relationships with any organizations that could potentially influence the subject matter of this work. I also confirm that I do not hold any professional or personal affiliations that may be perceived as affecting the impartiality and objectivity of my research. I have received no funding, grants, or honoraria related to the research presented in this work. Additionally, I have no personal relationships or collaborations that might pose a conflict of interest. This work is conducted with complete transparency, and I am committed to upholding the highest standards of integrity in my scholarly contributions., (Copyright © 2024 Elsevier Inc. All rights reserved.)
- Published
- 2024
- Full Text
- View/download PDF
6. DeepPLM_mCNN: An approach for enhancing ion channel and ion transporter recognition by multi-window CNN based on features from pre-trained language models.
- Author
-
Le VT, Malik MS, Tseng YH, Lee YC, Huang CI, and Ou YY
- Subjects
- Deep Learning, Ion Transport, Ion Channels metabolism, Ion Channels chemistry, Neural Networks, Computer
- Abstract
Accurate classification of membrane proteins like ion channels and transporters is critical for elucidating cellular processes and drug development. We present DeepPLM_mCNN, a novel framework combining Pretrained Language Models (PLMs) and multi-window convolutional neural networks (mCNNs) for effective classification of membrane proteins into ion channels and ion transporters. Our approach extracts informative features from protein sequences by utilizing various PLMs, including TAPE, ProtT5_XL_U50, ESM-1b, ESM-2_480, and ESM-2_1280. These PLM-derived features are then input into a mCNN architecture to learn conserved motifs important for classification. When evaluated on ion transporters, our best performing model utilizing ProtT5 achieved 90% sensitivity, 95.8% specificity, and 95.4% overall accuracy. For ion channels, we obtained 88.3% sensitivity, 95.7% specificity, and 95.2% overall accuracy using ESM-1b features. Our proposed DeepPLM_mCNN framework demonstrates significant improvements over previous methods on unseen test data. This study illustrates the potential of combining PLMs and deep learning for accurate computational identification of membrane proteins from sequence data alone. Our findings have important implications for membrane protein research and drug development targeting ion channels and transporters. The data and source codes in this study are publicly available at the following link: https://github.com/s1129108/DeepPLM_mCNN., Competing Interests: Declaration of Competing Interest I, Van The Le, hereby declare that I have no financial interests or relationships with any organizations that could potentially influence the subject matter of this work. I also confirm that I do not hold any professional or personal affiliations that may be perceived as affecting the impartiality and objectivity of my research. I have received no funding, grants, or honoraria related to the research presented in this work. Additionally, I have no personal relationships or collaborations that might pose a conflict of interest. This work is conducted with complete transparency, and I am committed to upholding the highest standards of integrity in my scholarly contributions., (Copyright © 2024 Elsevier Ltd. All rights reserved.)
- Published
- 2024
- Full Text
- View/download PDF
7. Ultrasound-guided acupotomy release for treating common peroneal nerve entrapment syndrome: a case description.
- Author
-
Ou YY, Li YN, Sun XJ, and Li SL
- Abstract
Competing Interests: Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-23-1277/coif). The authors have no conflicts of interest to declare.
- Published
- 2024
- Full Text
- View/download PDF
8. Characteristics and clustering analysis of peripheral blood lymphocyte subsets in children with systemic lupus erythematosus complicated with clinical infection.
- Author
-
Deng Y, Ou YY, Mo CJ, Huang L, and Qin X
- Subjects
- Humans, Child, Retrospective Studies, Lymphocyte Subsets, Cluster Analysis, T-Lymphocyte Subsets, Coinfection, Lupus Erythematosus, Systemic
- Abstract
Objectives: Clinical infection is a common complication in children with systemic lupus erythematosus (SLE). However, few studies have investigated immune alterations in children with SLE complicated with clinical infection. We assessed lymphocyte subsets in children with SLE to explore the possibility of clinical infection., Methods: We retrospectively analyzed the proportion of peripheral lymphocyte subsets in 140 children with SLE. Children with SLE were classified into different clusters according to the proportion of peripheral blood lymphocyte subsets: (CD3 + /CD4 + T cell, CD3 + /CD8 + T cell, CD3 + /CD4 + /CD8 + T cell, CD3 + /CD4-/CD8- T cell, CD19 + B cell, and CD3-/CD16 + /CD56 + NK cell). Differences in the proportion of lymphoid subsets, infection rates, and systemic lupus erythematosus disease activity index (SLEDAI) scores were compared between clusters. In addition, we grouped the subjects according to the presence or absence of infection. Proportions of lymphoid subsets, demographic variables, clinical presentation, and other laboratory variables were compared between the infected and uninfected groups. Finally, the diagnostic ability of lymphocyte subset ratios to distinguish secondary infection in children with SLE was predicted using an ROC curve., Results: Cluster C2 had a higher proportion of B cells than Cluster C1, while Cluster C1 had a lower proportion of NK cells, CD3 + T cells, CD3 + /CD4 + T cells, CD3 + /CD8 + T cells, and CD3 + /CD4-/CD8- T cells. Infection rates and SLEDAI scores were higher in Cluster C2 than in Cluster C1. The infected children had a higher proportion of B cells and a lower proportion of CD3 + T cells, CD3 + /CD4 + T cells, CD3 + /CD8 + T cells, and CD3 + /CD4-/CD8- T cells. There were no significant differences in lymphoid subsets between children in Cluster C2 and the infected groups. The area under the ROC curve of B lymphocytes in predicting SLE children with infection was 0.842. The area under the ROC curve was 0.855 when a combination of B cells, NK cells, CD4 + T cells, and CD8 + T cells was used to predict the outcome of coinfection., Conclusions: A high percentage of B cells and a low percentage of CD3 + T cells, CD3 + /CD4 + T cells, CD3 + /CD8 + T cells, CD3 + /CD4 + /CD8 + T cells, and CD3 + /CD4-/CD8- T cells may be associated with infection in children with SLE. B cells was used to predict the outcome of coinfection in children with SLE. Key Points • A high percentage of B cells and a low percentage of CD3 + T cells, CD3 + /CD4 + T cells, CD3 + /CD8 + T cells, CD3 + /CD4 + /CD8 + T cells, and CD3 + /CD4-/CD8- T cells may be associated with infection in children with SLE • B cells was used to predict the outcome of coinfection in children with SLE., (© 2023. The Author(s), under exclusive licence to International League of Associations for Rheumatology (ILAR).)
- Published
- 2023
- Full Text
- View/download PDF
9. Integrating Pre-Trained protein language model and multiple window scanning deep learning networks for accurate identification of secondary active transporters in membrane proteins.
- Author
-
Shahid Malik M and Ou YY
- Subjects
- Membrane Proteins, Neural Networks, Computer, Machine Learning, Amino Acid Sequence, Deep Learning
- Abstract
Secondary active transporters play pivotal roles in regulating ion and molecule transport across cell membranes, with implications in diseases like cancer. However, studying transporters via biochemical experiments poses challenges. We propose an effective computational approach to identify secondary active transporters from membrane protein sequences using pre-trained language models and deep learning neural networks. Our dataset comprised 290 secondary active transporters and 5,420 other membrane proteins from UniProt. Three types of features were extracted - one-hot encodings, position-specific scoring matrix profiles, and contextual embeddings from the ProtTrans language model. A multi-window convolutional neural network architecture scanned the ProtTrans embeddings using varying window sizes to capture multi-scale sequence patterns. The proposed model combining ProtTrans embeddings and multi-window convolutional neural networks achieved 86% sensitivity, 99% specificity and 98% overall accuracy in identifying secondary active transporters, outperforming conventional machine learning approaches. This work demonstrates the promise of integrating pre-trained language models like ProtTrans with multi-scale deep neural networks to effectively interpret transporter sequences for functional analysis. Our approach enables more accurate computational identification of secondary active transporters, advancing membrane protein research., Competing Interests: Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper., (Copyright © 2023 Elsevier Inc. All rights reserved.)
- Published
- 2023
- Full Text
- View/download PDF
10. Recent advances in features generation for membrane protein sequences: From multiple sequence alignment to pre-trained language models.
- Author
-
Ou YY, Ho QT, and Chang HT
- Subjects
- Animals, Horses, Sequence Alignment, Amino Acid Sequence, Sequence Analysis, Protein, Computational Biology methods, Membrane Proteins, Algorithms
- Abstract
Membrane proteins play a crucial role in various cellular processes and are essential components of cell membranes. Computational methods have emerged as a powerful tool for studying membrane proteins due to their complex structures and properties that make them difficult to analyze experimentally. Traditional features for protein sequence analysis based on amino acid types, composition, and pair composition have limitations in capturing higher-order sequence patterns. Recently, multiple sequence alignment (MSA) and pre-trained language models (PLMs) have been used to generate features from protein sequences. However, the significant computational resources required for MSA-based features generation can be a major bottleneck for many applications. Several methods and tools have been developed to accelerate the generation of MSAs and reduce their computational cost, including heuristics and approximate algorithms. Additionally, the use of PLMs such as BERT has shown great potential in generating informative embeddings for protein sequence analysis. In this review, we provide an overview of traditional and more recent methods for generating features from protein sequences, with a particular focus on MSAs and PLMs. We highlight the advantages and limitations of these approaches and discuss the methods and tools developed to address the computational challenges associated with features generation. Overall, the advancements in computational methods and tools provide a promising avenue for gaining deeper insights into the function and properties of membrane proteins, which can have significant implications in drug discovery and personalized medicine., (© 2023 Wiley-VCH GmbH.)
- Published
- 2023
- Full Text
- View/download PDF
11. Simulation and evaluation of increased imaging service capacity at the MRI department using reduced coil-setting times.
- Author
-
Sun YC, Wu HM, Guo WY, Ou YY, Yao MJ, and Lee LH
- Subjects
- Humans, Computer Simulation, Hospitals, Appointments and Schedules, Magnetic Resonance Imaging
- Abstract
The wait times for patients from their appointments to receiving magnetic resonance imaging (MRI) are usually long. To reduce this wait time, the present study proposed that service time wastage could be reduced by adjusting MRI examination scheduling by prioritizing patients who require examinations involving the same type of coil. This approach can reduce patient wait times and thereby maximize MRI departments' service times. To simulate an MRI department's action workflow, 2,447 MRI examination logs containing the deidentified information of patients and radiation technologists from the MRI department of a medical center were used, and a hybrid simulation model that combined discrete-event and agent-based simulations was developed. The experiment was conducted in two stages. In the first stage, the service time was increased by adjusting the examination schedule and thereby reducing the number of coil changes. In the second stage, the maximum number of additional patients that could be examined daily was determined. The average number of coil changes per day for the four MRI scanners of the aforementioned medical center was reduced by approximately 27. Thus, the MRI department gained 97.17 min/d, which enabled them to examine three additional patients per month. Consequently, the net monthly income of the hospital increased from US$17,067 to US$30,196, and the patient wait times for MRI examinations requiring the use of flexible torso and head, shoulder, 8-inch head, and torso MRI coils were shortened by 6 d and 23 h, 2 d and 15 h, 2 d and 9 h, and 16 h, respectively. Adjusting MRI examination scheduling by prioritizing patients that require the use of the same coil could reduce the coil-setting time, increase the daily number of patients who are examined, increase the net income of the MRI department, and shorten patient wait times for MRI examinations. Minimizing the operating times of specific examinations to maximize the number of services provided per day does not require additional personnel or resources. The results of the experimental simulations can be used as a reference by radiology department managers designing scheduling rules for examination appointments., Competing Interests: The authors have declared that no competing interests exist., (Copyright: © 2023 Sun et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
- Published
- 2023
- Full Text
- View/download PDF
12. Disto-TRP: An approach for identifying transient receptor potential (TRP) channels using structural information generated by AlphaFold.
- Author
-
Muazzam Ali Shah S and Ou YY
- Subjects
- Artificial Intelligence, Databases, Protein, Transient Receptor Potential Channels chemistry, Transient Receptor Potential Channels metabolism
- Abstract
The ability to predict 3D protein structures computationally has significantly advanced biological research. The AlphaFold protein structure database, developed by DeepMind, has provided a wealth of predicted protein structures and has the potential to bring about revolutionary changes in the field of life sciences. However, directly determining the function of proteins from their structures remains a challenging task. The Distogram from AlphaFold is used in this study as a novel feature set to identify transient receptor potential (TRP) channels. Distograms feature vectors and pre-trained language model (BERT) features were combined to improve prediction performance for transient receptor potential (TRP) channels. The method proposed in this study demonstrated promising performance on many evaluation metrics. For five-fold cross-validation, the method achieved a Sensitivity (SN) of 87.00%, Specificity (SP) of 93.61%, Accuracy (ACC) of 93.39%, and a Matthews correlation coefficient (MCC) of 0.52. Additionally, on an independent dataset, the method obtained 100.00% SN, 95.54% SP, 95.73% ACC, and an MCC of 0.69. The results demonstrate the potential for using structural information to predict protein function. In the future, it is hoped that such structural information will be incorporated into artificial intelligence networks to explore more useful and valuable functional information in the biological field., Competing Interests: Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper., (Copyright © 2023 Elsevier B.V. All rights reserved.)
- Published
- 2023
- Full Text
- View/download PDF
13. Peripheral blood lymphocyte subsets in children with nephrotic syndrome: a retrospective analysis.
- Author
-
Deng Y, Ou YY, Mo CJ, Huang L, Qin X, and Li S
- Subjects
- Child, Humans, Retrospective Studies, Lymphocyte Subsets, B-Lymphocytes, CD8-Positive T-Lymphocytes, Antigens, CD19, Lymphocyte Count, Nephrotic Syndrome
- Abstract
Background: Nephrotic syndrome (NS) in children is widely believed to be associated with severe changes in the immune system. Based on lymphocyte subset analysis, we examined the pathogenesis of immune deficiencies in children with NS with varying steroid sensitivity., Methods: Our study utilized flow cytometry to retrospectively analyze the ratios of lymphocyte subsets in 204 children with nephrotic syndrome and 19 healthy children., Results: Compared with healthy children, the ratio of CD4 + /CD8 + in onset and remission was decreased in SRNS group (p < 0.05), and CD19 + B lymphocytes were increased in onset (p < 0.05). Compared with onset, the proportion of CD19 + B lymphocytes decreased in SRNS, while the proportion of CD19 + B lymphocytes increased in SDNS, p < (0.01). The ratio of CD8 + T/CD19 + B in onset in SDNS group was significantly higher than that in SSNS and SRNS groups (p < 0.01) and healthy control group (p < 0.05). Compared with onset, the ratio of CD8 + T/CD19 + B in SDNS group decreased significantly (p < 0.01), while the ratio of CD8 + T/CD19 + B in SRNS group increased significantly (p < 0.01). The proportion of CD56 + CD16 + NK cells was significantly reduced in children with INS (p < 0.01)., Conclusion: CD8 + T lymphocytes may be involved in the mechanism of lymphocyte subsets disorder during onset of SDNS, while CD19 + B lymphocytes may be involved in the mechanism of lymphocyte subsets disorder during relapse of SDNS. The CD8 + T/CD19 + B ratio may predict the degree of frequent recurrence. There is a certain degree of lymphoid subsets disorder in children with NS., (© 2023. The Author(s).)
- Published
- 2023
- Full Text
- View/download PDF
14. Evaluation of the Prognostic Value of Existing Scoring Systems for Nosocomial Infection in Patients with Decompensated Liver Cirrhosis.
- Author
-
Zhao X, Ou YY, Guo D, Che XQ, and Li ZQ
- Subjects
- Humans, Prognosis, Severity of Illness Index, Liver Cirrhosis, ROC Curve, Predictive Value of Tests, Retrospective Studies, End Stage Liver Disease complications, Cross Infection
- Abstract
Background: Many scoring systems have been developed to evaluate the severity and survival of end-stage liver disorder patients. However, the conduction of these different predicting models has not been thoroughly verified in cirrhotic patients with nosocomial infections. This study ended to compare the predictive accuracy of various scoring systems., Methods: During January 2015 and January 2020, liver cirrhosis patients with nosocomial infections were involved in this study. The clinical data, laboratory findings, and demographic characteristics of patients were collected during diagnosis. Patients were followed up for at least 6 months or till death., Results: One hundred thirty-one patients meeting the criteria were enrolled and followed up for at least 6 months. The mortality rate at 30 days, 3 months, and 6 months was 23%, 35.1%, and 39.6%, respectively. The univariate analysis showed that all scoring systems indicated statistical significance between the surviving group and the non-surviving group at 6 months. Model for end-stage liver disease-Na showed excellent predictive accuracy in predicting the survival at 30 days, 3 months, 6 months, with the area under the curve of 0.807, 0.850, and 0.844, respectively. Model for end-stage liver disease-Na demonstrated sensitivities of more than 85%. In contrast, the child-turcotte-pugh and albumin-bilirubin scores showed a poorer predictive capability., Conclusion: All 5 models for end-stage liver disease-related scores (model for end-stage liver disease, model for end-stage liver diseaseto-serum sodium ratio, model for end-stage liver disease-Na, model for end-stage liver disease-Delta, snd integrated model for endstage liver disease) exhibited a reliable prediction for mortality of long-term prognosis and short-term prognosis of cirrhotic patients with nosocomial infections. Among them, the model for end-stage liver disease-Na score might be the best choice.
- Published
- 2023
- Full Text
- View/download PDF
15. MFPS_CNN: Multi-filter Pattern Scanning from Position-specific Scoring Matrix with Convolutional Neural Network for Efficient Prediction of Ion Transporters.
- Author
-
Nguyen TT, Ho QT, Tarn YC, and Ou YY
- Subjects
- Ions, Membrane Proteins, Position-Specific Scoring Matrices, Algorithms, Neural Networks, Computer
- Abstract
In cellular transportation mechanisms, the movement of ions across the cell membrane and its proper control are important for cells, especially for life processes. Ion transporters/pumps and ion channel proteins work as border guards controlling the incessant traffic of ions across cell membranes. We revisited the study of classification of transporters and ion channels from membrane proteins with a more efficient deep learning approach. Specifically, we applied multi-window scanning filters of convolutional neural networks on almost full-length position-specific scoring matrices for extracting useful information. In this way, we were able to retain important evolutionary information of the proteins. Our experiment results show that a convolutional neural network with a minimum number of convolutional layers can be enough to extract the conserved information of proteins which leads to higher performance. Our best prediction models were obtained after examining different data imbalanced handling techniques, and different protein encoding methods. We also showed that our models were superior to traditional deep learning approaches on the same datasets as well as other machine learning classification algorithms., (© 2022 Wiley-VCH GmbH.)
- Published
- 2022
- Full Text
- View/download PDF
16. Using multiple convolutional window scanning of convolutional neural network for an efficient prediction of ATP-binding sites in transport proteins.
- Author
-
Nguyen TT, Chen S, Ho QT, and Ou YY
- Subjects
- Algorithms, Binding Sites, Machine Learning, Neural Networks, Computer, Proteins chemistry, Adenosine Triphosphate, Carrier Proteins
- Abstract
Protein multiple sequence alignment information has long been important features to know about functions of proteins inferred from related sequences with known functions. It is therefore one of the underlying ideas of Alpha fold 2, a breakthrough study and model for the prediction of three-dimensional structures of proteins from their primary sequence. Our study used protein multiple sequence alignment information in the form of position-specific scoring matrices as input. We also refined the use of a convolutional neural network, a well-known deep-learning architecture with impressive achievement on image and image-like data. Specifically, we revisited the study of prediction of adenosine triphosphate (ATP)-binding sites with more efficient convolutional neural networks. We applied multiple convolutional window scanning filters of a convolutional neural network on position-specific scoring matrices for as much as useful information as possible. Furthermore, only the most specific motifs are retained at each feature map output through the one-max pooling layer before going to the next layer. We assumed that this way could help us retain the most conserved motifs which are discriminative information for prediction. Our experiment results show that a convolutional neural network with not too many convolutional layers can be enough to extract the conserved information of proteins, which leads to higher performance. Our best prediction models were obtained after examining them with different hyper-parameters. Our experiment results showed that our models were superior to traditional use of convolutional neural networks on the same datasets as well as other machine-learning classification algorithms., (© 2022 Wiley Periodicals LLC.)
- Published
- 2022
- Full Text
- View/download PDF
17. Use Chou's 5-Steps Rule With Different Word Embedding Types to Boost Performance of Electron Transport Protein Prediction Model.
- Author
-
Nguyen TT, Ho QT, Le NQ, Phan VD, and Ou YY
- Subjects
- Electron Transport, Electrons, Support Vector Machine, Carrier Proteins, Computational Biology methods
- Abstract
Living organisms receive necessary energy substances directly from cellular respiration. The completion of electron storage and transportation requires the process of cellular respiration with the aid of electron transport chains. Therefore, the work of deciphering electron transport proteins is inevitably needed. The identification of these proteins with high performance has a prompt dependence on the choice of methods for feature extraction and machine learning algorithm. In this study, protein sequences served as natural language sentences comprising words. The nominated word embedding-based feature sets, hinged on the word embedding modulation and protein motif frequencies, were useful for feature choosing. Five word embedding types and a variety of conjoint features were examined for such feature selection. The support vector machine algorithm consequentially was employed to perform classification. The performance statistics within the 5-fold cross-validation including average accuracy, specificity, sensitivity, as well as MCC rates surpass 0.95. Such metrics in the independent test are 96.82, 97.16, 95.76 percent, and 0.9, respectively. Compared to state-of-the-art predictors, the proposed method can generate more preferable performance above all metrics indicating the effectiveness of the proposed method in determining electron transport proteins. Furthermore, this study reveals insights about the applicability of various word embeddings for understanding surveyed sequences.
- Published
- 2022
- Full Text
- View/download PDF
18. mCNN-ETC: identifying electron transporters and their functional families by using multiple windows scanning techniques in convolutional neural networks with evolutionary information of protein sequences.
- Author
-
Ho QT, Le NQK, and Ou YY
- Subjects
- Amino Acid Sequence, Biological Evolution, Humans, Proteins chemistry, Electrons, Neural Networks, Computer
- Abstract
In the past decade, convolutional neural networks (CNNs) have been used as powerful tools by scientists to solve visual data tasks. However, many efforts of convolutional neural networks in solving protein function prediction and extracting useful information from protein sequences have certain limitations. In this research, we propose a new method to improve the weaknesses of the previous method. mCNN-ETC is a deep learning model which can transform the protein evolutionary information into image-like data composed of 20 channels, which correspond to the 20 amino acids in the protein sequence. We constructed CNN layers with different scanning windows in parallel to enhance the useful pattern detection ability of the proposed model. Then we filtered specific patterns through the 1-max pooling layer before inputting them into the prediction layer. This research attempts to solve a basic problem in biology in terms of application: predicting electron transporters and classifying their corresponding complexes. The performance result reached an accuracy of 97.41%, which was nearly 6% higher than its predecessor. We have also published a web server on http://bio219.bioinfo.yzu.edu.tw, which can be used for research purposes free of charge., (© The Author(s) 2021. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.)
- Published
- 2022
- Full Text
- View/download PDF
19. An Extensive Examination of Discovering 5-Methylcytosine Sites in Genome-Wide DNA Promoters Using Machine Learning Based Approaches.
- Author
-
Nguyen TT, Tran TA, Le NQ, Pham DM, and Ou YY
- Subjects
- DNA, DNA Methylation genetics, Promoter Regions, Genetic genetics, 5-Methylcytosine, Machine Learning
- Abstract
It is well-known that the major reason for the rapid proliferation of cancer cells are the hypomethylation of the whole cancer genome and the hypermethylation of the promoter of particular tumor suppressor genes. Locating 5-methylcytosine (5mC) sites in promoters is therefore a crucial step in further understanding of the relationship between promoter methylation and the regulation of mRNA gene expression. High throughput identification of DNA 5mC in wet lab is still time-consuming and labor-extensive. Thus, finding the 5mC site of genome-wide DNA promoters is still an important task. We compared the effectiveness of the most popular and strong machine learning techniques namely XGBoost, Random Forest, Deep Forest, and Deep Feedforward Neural Network in predicting the 5mC sites of genome-wide DNA promoters. A feature extraction method based on k-mers embeddings learned from a language model were also applied. Overall, the performance of all the surveyed models surpassed deep learning models of the latest studies on the same dataset employing other encoding scheme. Furthermore, the best model achieved AUC scores of 0.962 on both cross-validation and independent test data. We concluded that our approach was efficient for identifying 5mC sites of promoters with high performance.
- Published
- 2022
- Full Text
- View/download PDF
20. Using k-mer embeddings learned from a Skip-gram based neural network for building a cross-species DNA N6-methyladenine site prediction model.
- Author
-
Nguyen TTD, Trinh VN, Le NQK, and Ou YY
- Subjects
- Adenine metabolism, Base Sequence, DNA, Plant genetics, Databases, Genetic, Nucleotides genetics, Plants genetics, ROC Curve, Surveys and Questionnaires, Adenine analogs & derivatives, Algorithms, Models, Biological, Neural Networks, Computer
- Abstract
Key Message: This study used k-mer embeddings as effective feature to identify DNA N6-Methyladenine sites in plant genomes and obtained improved performance without substantial effort in feature extraction, combination and selection. Identification of DNA N6-methyladenine sites has been a very active topic of computational biology due to the unavailability of suitable methods to identify them accurately, especially in plants. Substantial results were obtained with a great effort put in extracting, heuristic searching, or fusing a diverse types of features, not to mention a feature selection step. In this study, we regarded DNA sequences as textual information and employed natural language processing techniques to decipher hidden biological meanings from those sequences. In other words, we considered DNA, the human life book, as a book corpus for training DNA language models. K-mer embeddings then were generated from these language models to be used in machine learning prediction models. Skip-gram neural networks were the base of the language models and ensemble tree-based algorithms were the machine learning algorithms for prediction models. We trained the prediction model on Rosaceae genome dataset and performed a comprehensive test on 3 plant genome datasets. Our proposed method shows promising performance with AUC performance approaching an ideal value on Rosaceae dataset (0.99), a high score on Rice dataset (0.95) and improved performance on Rice dataset while enjoying an elegant, yet efficient feature extraction process., (© 2021. The Author(s), under exclusive licence to Springer Nature B.V.)
- Published
- 2021
- Full Text
- View/download PDF
21. Identification of efflux proteins based on contextual representations with deep bidirectional transformer encoders.
- Author
-
Taju SW, Shah SMA, and Ou YY
- Subjects
- Carrier Proteins analysis, Computational Biology, Natural Language Processing, Support Vector Machine
- Abstract
Efflux proteins are the transport proteins expressed in the plasma membrane, which are involved in the movement of unwanted toxic substances through specific efflux pumps. Several studies based on computational approaches have been proposed to predict transport proteins and thereby to understand the mechanism of the movement of ions across cell membranes. However, few methods were developed to identify efflux proteins. This paper presents an approach based on the contextualized word embeddings from Bidirectional Encoder Representations from Transformers (BERT) with the Support Vector Machine (SVM) classifier. BERT is the most effective pre-trained language model that performs exceptionally well on several Natural Language Processing (NLP) tasks. Therefore, the contextualized representations from BERT were implemented to incorporate multiple interpretations of identical amino acids in the sequence. A dataset of efflux proteins with annotations was first established. The feature vectors were extracted by transferring protein data through the hidden layers of the pre-trained model. Our proposed method was trained on complete training datasets to identify efflux proteins and achieved the accuracies of 94.15% and 87.13% in the independent tests on membrane and transport datasets, respectively. This study opens a research avenue for the implementation of contextualized word embeddings in Bioinformatics and Computational Biology., (Copyright © 2021 Elsevier Inc. All rights reserved.)
- Published
- 2021
- Full Text
- View/download PDF
22. Addressing data imbalance problems in ligand-binding site prediction using a variational autoencoder and a convolutional neural network.
- Author
-
Nguyen TT, Nguyen DK, and Ou YY
- Subjects
- Algorithms, Deep Learning, Ligands, Neural Networks, Computer
- Abstract
Since 2015, a fast growing number of deep learning-based methods have been proposed for protein-ligand binding site prediction and many have achieved promising performance. These methods, however, neglect the imbalanced nature of binding site prediction problems. Traditional data-based approaches for handling data imbalance employ linear interpolation of minority class samples. Such approaches may not be fully exploited by deep neural networks on downstream tasks. We present a novel technique for balancing input classes by developing a deep neural network-based variational autoencoder (VAE) that aims to learn important attributes of the minority classes concerning nonlinear combinations. After learning, the trained VAE was used to generate new minority class samples that were later added to the original data to create a balanced dataset. Finally, a convolutional neural network was used for classification, for which we assumed that the nonlinearity could be fully integrated. As a case study, we applied our method to the identification of FAD- and FMN-binding sites of electron transport proteins. Compared with the best classifiers that use traditional machine learning algorithms, our models obtained a great improvement on sensitivity while maintaining similar or higher levels of accuracy and specificity. We also demonstrate that our method is better than other data imbalance handling techniques, such as SMOTE, ADASYN, and class weight adjustment. Additionally, our models also outperform existing predictors in predicting the same binding types. Our method is general and can be applied to other data types for prediction problems with moderate-to-heavy data imbalances., (© The Author(s) 2021. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.)
- Published
- 2021
- Full Text
- View/download PDF
23. TRP-BERT: Discrimination of transient receptor potential (TRP) channels using contextual representations from deep bidirectional transformer based on BERT.
- Author
-
Ali Shah SM and Ou YY
- Subjects
- Amino Acid Sequence, Animals, Computational Biology, Natural Language Processing, Neural Networks, Computer, Support Vector Machine, Transient Receptor Potential Channels genetics
- Abstract
Transient receptor potential (TRP) channels are non-selective cation channels that act as ion channels and are primarily found on the plasma membrane of numerous animal cells. These channels are involved in the physiology and pathophysiology of a wide variety of biological processes, including inhibition and progression of cancer, pain initiation, inflammation, regulation of pressure, thermoregulation, secretion of salivary fluid, and homeostasis of Ca
2+ and Mg2+ . Increasing evidences indicate that mutations in the gene encoding TRP channels play an essential role in a broad array of diseases. Therefore, these channels are becoming popular as potential drug targets for several diseases. The diversified role of these channels demands a prediction model to classify TRP channels from other channel proteins (non-TRP channels). Therefore, we presented an approach based on the Support Vector Machine (SVM) classifier and contextualized word embeddings from Bidirectional Encoder Representations from Transformers (BERT) to represent protein sequences. BERT is a deeply bidirectional language model and a neural network approach to Natural Language Processing (NLP) that achieves outstanding performance on various NLP tasks. We apply BERT to generate contextualized representations for every single amino acid in a protein sequence. Interestingly, these representations are context-sensitive and vary for the same amino acid appearing in different positions in the sequence. Our proposed method showed 80.00% sensitivity, 96.03% specificity, 95.47% accuracy, and a 0.56 Matthews correlation coefficient (MCC) for an independent test set. We suggest that our proposed method could effectively classify TRP channels from non-TRP channels and assist biologists in identifying new potential TRP channels., (Copyright © 2021 Elsevier Ltd. All rights reserved.)- Published
- 2021
- Full Text
- View/download PDF
24. A transformer architecture based on BERT and 2D convolutional neural network to identify DNA enhancers from sequence information.
- Author
-
Le NQK, Ho QT, Nguyen TT, and Ou YY
- Subjects
- Computer Simulation, Data Accuracy, Humans, Multilingualism, Semantics, Sensitivity and Specificity, Transcription, Genetic, Computational Biology methods, DNA genetics, Deep Learning, Enhancer Elements, Genetic, Models, Biological, Natural Language Processing
- Abstract
Recently, language representation models have drawn a lot of attention in the natural language processing field due to their remarkable results. Among them, bidirectional encoder representations from transformers (BERT) has proven to be a simple, yet powerful language model that achieved novel state-of-the-art performance. BERT adopted the concept of contextualized word embedding to capture the semantics and context of the words in which they appeared. In this study, we present a novel technique by incorporating BERT-based multilingual model in bioinformatics to represent the information of DNA sequences. We treated DNA sequences as natural sentences and then used BERT models to transform them into fixed-length numerical matrices. As a case study, we applied our method to DNA enhancer prediction, which is a well-known and challenging problem in this field. We then observed that our BERT-based features improved more than 5-10% in terms of sensitivity, specificity, accuracy and Matthews correlation coefficient compared to the current state-of-the-art features in bioinformatics. Moreover, advanced experiments show that deep learning (as represented by 2D convolutional neural networks; CNN) holds potential in learning BERT features better than other traditional machine learning techniques. In conclusion, we suggest that BERT and 2D CNNs could open a new avenue in biological modeling using sequence information., (© The Author(s) 2021. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.)
- Published
- 2021
- Full Text
- View/download PDF
25. ActTRANS: Functional classification in active transport proteins based on transfer learning and contextual representations.
- Author
-
Taju SW, Shah SMA, and Ou YY
- Subjects
- Amino Acid Sequence, Biological Transport, Active, Carrier Proteins chemistry, Carrier Proteins metabolism, Natural Language Processing, Support Vector Machine
- Abstract
Motivation: Primary and secondary active transport are two types of active transport that involve using energy to move the substances. Active transport mechanisms do use proteins to assist in transport and play essential roles to regulate the traffic of ions or small molecules across a cell membrane against the concentration gradient. In this study, the two main types of proteins involved in such transport are classified from transmembrane transport proteins. We propose a Support Vector Machine (SVM) with contextualized word embeddings from Bidirectional Encoder Representations from Transformers (BERT) to represent protein sequences. BERT is a powerful model in transfer learning, a deep learning language representation model developed by Google and one of the highest performing pre-trained model for Natural Language Processing (NLP) tasks. The idea of transfer learning with pre-trained model from BERT is applied to extract fixed feature vectors from the hidden layers and learn contextual relations between amino acids in the protein sequence. Therefore, the contextualized word representations of proteins are introduced to effectively model complex structures of amino acids in the sequence and the variations of these amino acids in the context. By generating context information, we capture multiple meanings for the same amino acid to reveal the importance of specific residues in the protein sequence., Results: The performance of the proposed method is evaluated using five-fold cross-validation and independent test. The proposed method achieves an accuracy of 85.44 %, 88.74 % and 92.84 % for Class-1, Class-2, and Class-3, respectively. Experimental results show that this approach can outperform from other feature extraction methods using context information, effectively classify two types of active transport and improve the overall performance., (Copyright © 2021 Elsevier Ltd. All rights reserved.)
- Published
- 2021
- Full Text
- View/download PDF
26. DeepSIRT: A deep neural network for identification of sirtuin targets and their subcellular localizations.
- Author
-
Shah SMA, Taju SW, Dlamini BB, and Ou YY
- Subjects
- Humans, Deep Learning, Neural Networks, Computer, Sirtuins analysis
- Abstract
Sirtuins are a family of proteins that play a key role in regulating a wide range of cellular processes including DNA regulation, metabolism, aging/longevity, cell survival, apoptosis, and stress resistance. Sirtuins are protein deacetylases and include in the class III family of histone deacetylase enzymes (HDACs). The class III HDACs contains seven members of the sirtuin family from SIRT1 to SIRT7. The seven members of the sirtuin family have various substrates and are present in nearly all subcellular localizations including the nucleus, cytoplasm, and mitochondria. In this study, a deep neural network approach using one-dimensional Convolutional Neural Networks (CNN) was proposed to build a prediction model that can accurately identify the outcome of the sirtuin protein by targeting their subcellular localizations. Therefore, the function and localization of sirtuin targets were analyzed and annotated to compartmentalize into distinct subcellular localizations. We further reduced the sequence similarity between protein sequences and three feature extraction methods were applied in datasets. Finally, the proposed method has been tested and compared with various machine-learning algorithms. The proposed method is validated on two independent datasets and showed an average of up to 85.77 % sensitivity, 97.32 % specificity, and 0.82 MCC for seven members of the sirtuin family of proteins., (Copyright © 2021 Elsevier Ltd. All rights reserved.)
- Published
- 2021
- Full Text
- View/download PDF
27. FAD-BERT: Improved prediction of FAD binding sites using pre-training of deep bidirectional transformers.
- Author
-
Ho QT, Nguyen TT, Khanh Le NQ, and Ou YY
- Subjects
- Amino Acid Sequence, Binding Sites, Electric Power Supplies, Amino Acids, Flavin-Adenine Dinucleotide metabolism
- Abstract
The electron transport chain is a series of protein complexes embedded in the process of cellular respiration, which is an important process to transfer electrons and other macromolecules throughout the cell. Identifying Flavin Adenine Dinucleotide (FAD) binding sites in the electron transport chain is vital since it helps biological researchers precisely understand how electrons are produced and are transported in cells. This study distills and analyzes the contextualized word embedding from pre-trained BERT models to explore similarities in natural language and protein sequences. Thereby, we propose a new approach based on Pre-training of Bidirectional Encoder Representations from Transformers (BERT), Position-specific Scoring Matrix profiles (PSSM), Amino Acid Index database (AAIndex) to predict FAD-binding sites from the transport proteins which are found in nature recently. Our proposed approach archives 85.14% accuracy and improves accuracy by 11%, with Matthew's correlation coefficient of 0.39 compared to the previous method on the same independent set. We also deploy a web server that identifies FAD-binding sites in electron transporters available for academics at http://140.138.155.216/fadbert/., (Copyright © 2021 Elsevier Ltd. All rights reserved.)
- Published
- 2021
- Full Text
- View/download PDF
28. GT-Finder: Classify the family of glucose transporters with pre-trained BERT language models.
- Author
-
Ali Shah SM, Taju SW, Ho QT, Nguyen TT, and Ou YY
- Subjects
- Glucose, Language, Semantics, Glucose Transport Proteins, Facilitative, Natural Language Processing
- Abstract
Recently, language representation models have drawn a lot of attention in the field of natural language processing (NLP) due to their remarkable results. Among them, BERT (Bidirectional Encoder Representations from Transformers) has proven to be a simple, yet powerful language model that has achieved novel state-of-the-art performance. BERT adopted the concept of contextualized word embeddings to capture the semantics and context in which words appear. We utilized pre-trained BERT models to extract features from protein sequences for discriminating three families of glucose transporters: the major facilitator superfamily of glucose transporters (GLUTs), the sodium-glucose linked transporters (SGLTs), and the sugars will eventually be exported transporters (SWEETs). We treated protein sequences as sentences and transformed them into fixed-length meaningful vectors where a 768- or 1024-dimensional vector represents each amino acid. We observed that BERT-Base and BERT-Large models improved the performance by more than 4% in terms of average sensitivity and Matthews correlation coefficient (MCC), indicating the efficiency of this approach. We also developed a bidirectional transformer-based protein model (TransportersBERT) for comparison with existing pre-trained BERT models., (Copyright © 2021. Published by Elsevier Ltd.)
- Published
- 2021
- Full Text
- View/download PDF
29. Incorporating a transfer learning technique with amino acid embeddings to efficiently predict N-linked glycosylation sites in ion channels.
- Author
-
Nguyen TT, Le NQ, Tran TA, Pham DM, and Ou YY
- Subjects
- Amino Acid Sequence, Glycosylation, Ion Channels, Amino Acids, Machine Learning
- Abstract
Glycosylation is a dynamic enzymatic process that attaches glycan to proteins or other organic molecules such as lipoproteins. Research has shown that such a process in ion channel proteins plays a fundamental role in modulating ion channel functions. This study used a computational method to predict N-linked glycosylation sites, the most common type, in ion channel proteins. From segments of ion channel proteins centered around N-linked glycosylation sites, the amino acid embedding vectors of each residue were concatenated to create features for prediction. We experimented with two different models for converting amino acids to their corresponding embeddings: one was fed with ion channel sequences and the other with a large dataset composed of more than one million protein sequences. The latter model stemmed from the idea of transfer learning technique and emerged as a more efficient feature extractor. Our best model was obtained from this transfer learning approach and a hyperparameter tuning process with a random search on 5-fold cross-validation data. It achieved an accuracy, specificity, sensitivity, and Matthews correlation coefficient of 93.4%, 92.8%, 98.6%, and 0.726, respectively. Corresponding scores on an independent test were 92.9%, 92.2%, 99%, and 0.717. These results outperform the position-specific scoring matrix features that are predominantly employed in post-translational modification site predictions. Furthermore, compared to N-GlyDE, GlycoEP, SPRINT-Gly, the most recent N-linked glycosylation site predictors, our model yields higher scores on the above 4 metrics, thus further demonstrating the efficiency of our approach., (Copyright © 2021. Published by Elsevier Ltd.)
- Published
- 2021
- Full Text
- View/download PDF
30. TNFPred: identifying tumor necrosis factors using hybrid features based on word embeddings.
- Author
-
Nguyen TT, Le NQ, Ho QT, Phan DV, and Ou YY
- Subjects
- Amino Acid Sequence, Humans, Natural Language Processing, Tumor Necrosis Factors chemistry, Computational Biology, Machine Learning, Tumor Necrosis Factors metabolism
- Abstract
Background: Cytokines are a class of small proteins that act as chemical messengers and play a significant role in essential cellular processes including immunity regulation, hematopoiesis, and inflammation. As one important family of cytokines, tumor necrosis factors have association with the regulation of a various biological processes such as proliferation and differentiation of cells, apoptosis, lipid metabolism, and coagulation. The implication of these cytokines can also be seen in various diseases such as insulin resistance, autoimmune diseases, and cancer. Considering the interdependence between this kind of cytokine and others, classifying tumor necrosis factors from other cytokines is a challenge for biological scientists., Methods: In this research, we employed a word embedding technique to create hybrid features which was proved to efficiently identify tumor necrosis factors given cytokine sequences. We segmented each protein sequence into protein words and created corresponding word embedding for each word. Then, word embedding-based vector for each sequence was created and input into machine learning classification models. When extracting feature sets, we not only diversified segmentation sizes of protein sequence but also conducted different combinations among split grams to find the best features which generated the optimal prediction. Furthermore, our methodology follows a well-defined procedure to build a reliable classification tool., Results: With our proposed hybrid features, prediction models obtain more promising performance compared to seven prominent sequenced-based feature kinds. Results from 10 independent runs on the surveyed dataset show that on an average, our optimal models obtain an area under the curve of 0.984 and 0.998 on 5-fold cross-validation and independent test, respectively., Conclusions: These results show that biologists can use our model to identify tumor necrosis factors from other cytokines efficiently. Moreover, this study proves that natural language processing techniques can be applied reasonably to help biologists solve bioinformatics problems efficiently.
- Published
- 2020
- Full Text
- View/download PDF
31. Using Language Representation Learning Approach to Efficiently Identify Protein Complex Categories in Electron Transport Chain.
- Author
-
Nguyen TT, Le NQ, Ho QT, Phan DV, and Ou YY
- Subjects
- Amino Acid Sequence, Electron Transport, Humans, Natural Language Processing, Support Vector Machine, Word Processing, Computational Biology methods, Multiprotein Complexes classification, Multiprotein Complexes metabolism
- Abstract
We herein proposed a novel approach based on the language representation learning method to categorize electron complex proteins into 5 types. The idea is stemmed from the the shared characteristics of human language and protein sequence language, thus advanced natural language processing techniques were used for extracting useful features. Specifically, we employed transfer learning and word embedding techniques to analyze electron complex sequences and create efficient feature sets before using a support vector machine algorithm to classify them. During the 5-fold cross-validation processes, seven types of sequence-based features were analyzed to find the optimal features. On an average, our final classification models achieved the accuracy, specificity, sensitivity, and MCC of 96 %, 96.1 %, 95.3 %, and 0.86, respectively on cross-validation data. For the independent test data, those corresponding performance scores are 95.3 %, 92.6 %, 94 %, and 0.87. We concluded that using feature extracted using these representation learning methods, the prediction performance of simple machine learning algorithm is on par with existing deep neural network method on the task of categorizing electron complexes while enjoying a much faster way for feature generation. Furthermore, the results also showed that the combination of features learned from the representation learning methods and sequence motif counts helps yield better performance., (© 2020 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.)
- Published
- 2020
- Full Text
- View/download PDF
32. [Dissection of seed development of Gastrodia elata at different temperatures].
- Author
-
Yuan QS, Wang H, Jiang WK, Ou XH, Xu J, Wang XA, Wang L, Ou YY, and Zhou T
- Subjects
- Phenotype, Reproduction, Fruit growth & development, Gastrodia growth & development, Seeds growth & development, Temperature
- Abstract
The study is aimed to create seed materials and dissect the molecular mechanism of sexual propagation of Gastrodia elata. In this research, thirteen characteristics of flowers, flower stem, fruits, seeds and embryo of G.elata f. glauca and G.elata f. elata after bolting at room temperature(RT) and constant temperature(CT, 22 ℃) were determined. It was found that the constant temperature condition could prolong the bolting duration of G.elata and increased the number of flowers, while the variety of G.elata only affected the bolting duration, but had no effect on the number of flowers, and the G.elata f. elata was more likely to bolting than the G.elata f. glauca. The variety of G.elata was the main factor affecting the time of dehiscent fruit of G.elata, the temperature was the main factor affecting the fruits number and fruits diameter, and the constant temperature was more conducive to the fruits shape of G.elata than the room temperature. There was no significant difference in seed phenotype of G.elata varieties, but the seed embryo of G.elata seeds cultivated at constant temperature was fuller than that of G.elata cultivated at room temperature, and temperature had less influence on the seed phenotype of G.elata. But it was interesting to find that temperature and varieties had greater influence on the seed embryo of G.elata, constant temperature cultivation was more conducive to the formation of seed embryo of G.elata, and more the seed embryo of G.elata f. elata was easier to form than the seed embryo of G.elata f. glauca. However, the development of seeds and embryos of G.elata was significantly affected, and the development of seeds and embryos of G.elata f. glauca was more sensitive to temperature than G.elata f. elata. The research suggested that it is advisable for G.elata to produce seed materials by bolting at constant temperature(22 ℃).
- Published
- 2020
- Full Text
- View/download PDF
33. Prediction of ATP-binding sites in membrane proteins using a two-dimensional convolutional neural network.
- Author
-
Nguyen TT, Le NQ, Kusuma RMI, and Ou YY
- Subjects
- Adenosine Triphosphate metabolism, Algorithms, Amino Acid Motifs, Amino Acid Sequence, Computational Biology methods, Databases, Protein, Machine Learning, Membrane Proteins metabolism, Position-Specific Scoring Matrices, ROC Curve, Reproducibility of Results, Web Browser, Adenosine Triphosphate chemistry, Binding Sites, Membrane Proteins chemistry, Models, Theoretical, Neural Networks, Computer
- Abstract
Membrane proteins, the most important drug targets, account for around 30% of total proteins encoded by the genome of living organisms. An important role of these proteins is to bind adenosine triphosphate (ATP), facilitating crucial biological processes such as metabolism and cell signaling. There are several reports elucidating ATP-binding sites within proteins. However, such studies on membrane proteins are limited. Our prediction tool, DeepATP, combines evolutionary information in the form of Position Specific Scoring Matrix and two-dimensional Convolutional Neural Network to predict ATP-binding sites in membrane proteins with an MCC of 0.89 and an AUC of 99%. Compared to recently published ATP-binding site predictors and classifiers that use traditional machine learning algorithms, our approach performs significantly better. We suggest this method as a reliable tool for biologists for ATP-binding site prediction in membrane proteins., (Copyright © 2019 Elsevier Inc. All rights reserved.)
- Published
- 2019
- Full Text
- View/download PDF
34. The neural substrate of self- and other-concerned wellbeing: An fMRI study.
- Author
-
Jo H, Ou YY, and Kung CC
- Subjects
- Adult, Female, Humans, Male, Psychophysiology, Gyrus Cinguli diagnostic imaging, Gyrus Cinguli physiology, Happiness, Magnetic Resonance Imaging, Neural Pathways diagnostic imaging, Neural Pathways physiology
- Abstract
Happiness, or Subjective Well-Being (SWB), is generally considered as a peaceful and satisfied state accompanied by consistent and optimistic mood. Due to its subjective and elusive nature, however, wellbeing has only been scarcely investigated in the neuroimaging literature. In this study, we investigated its neural substrates by characterizing two different perspectives: self- or other-concerned wellbeing. In the present study, 22 participants evaluated the subjective happiness (with button presses 1 to 4) to 3 categories (intra- and inter-personal and neutral) of pre-rated pictures in a slow event-related fMRI. Because wellbeing is constantly featured by pleasure feelings after self-inspection, we predict that happier conditions, featured by "intra-personal vs. neutral" and "inter-personal vs. neutral" conditions, should yield higher BOLD activities in overlapping reward- and self-related regions. Indeed, medial prefrontal (mPFC), pregenual ACC (pACC), precuneus and posterior cingulate cortex (PCC) were revealed both by General Linear Model (GLM) (categorical contrasts) and parametric modulations (correlations with rating 1-4s), specifically, more connectivity between nucleus accumbens (NAcc) and mPFC, via additional psychophysiological interaction, or PPI, analyses. More interestingly, GLM and multivariate searchlight analyses jointly reveal the subdivision of mPFC and the PCC/precuneus, with anterior mPFC and dorsal PCC/precuneus more for interpersonal, posterior mPFC and ventral PCC/precuneus more for intrapersonal, SWB, respectively. Taken together, these results are not only consistent with the "cortical midline hypothesis of the self", but also extending the "spatial gradients of self-to-other-concerned processing" from mPFC to including both mPFC and PCC/precuneus, making them two "hubs" of self-to-other-concerned wellbeing network., Competing Interests: The authors have declared that no competing interests exist.
- Published
- 2019
- Full Text
- View/download PDF
35. Using word embedding technique to efficiently represent protein sequences for identifying substrate specificities of transporters.
- Author
-
Nguyen TT, Le NQ, Ho QT, Phan DV, and Ou YY
- Subjects
- Amino Acid Sequence, Humans, Natural Language Processing, Substrate Specificity, Support Vector Machine, Computational Biology methods, Membrane Transport Proteins chemistry
- Abstract
Membrane transport proteins and their substrate specificities play crucial roles in various cellular functions. Identifying the substrate specificities of membrane transport proteins is closely related to protein-target interaction prediction, drug design, membrane recruitment, and dysregulation analysis, thus being an important problem for bioinformatics researchers. In this study, we applied word embedding approach, the main cause for natural language processing breakout in recent years, to protein sequences of transporters. We defined each protein sequence based on the word embeddings and frequencies of its biological words. The protein features were then fed into machine learning models for prediction. We also varied the lengths of protein sequence's constituent biological words to find the optimal length which generated the most discriminative feature set. Compared to four other feature types created from protein sequences, our proposed features can help prediction models yield superior performance. Our best models reach an average area under the curve of 0.96 and 0.99, respectively on the 5-fold cross validation and the independent test. With this result, our study can help biologists identify transporters based on substrate specificities as well as provides a basis for further research that enriches a field of applying natural language processing techniques in bioinformatics., (Copyright © 2019 Elsevier Inc. All rights reserved.)
- Published
- 2019
- Full Text
- View/download PDF
36. iMotor-CNN: Identifying molecular functions of cytoskeleton motor proteins using 2D convolutional neural network via Chou's 5-step rule.
- Author
-
Le NQK, Yapp EKY, Ou YY, and Yeh HY
- Subjects
- Algorithms, Humans, Machine Learning, Cytoskeletal Proteins physiology, Molecular Motor Proteins physiology, Neural Networks, Computer
- Abstract
Motor proteins are the driving force behind muscle contraction and are responsible for the active transportation of most proteins and vesicles in the cytoplasm. There are three superfamilies of cytoskeletal motor proteins with various molecular functions and structures: dynein, kinesin, and myosin. The functional loss of a specific motor protein molecular function has linked to a variety of human diseases, e.g., Charcot-Marie-Tooth disease, kidney disease. Therefore, creating a precise model to classify motor proteins is essential for helping biologists understand their molecular functions and design drug targets according to their impact on human diseases. Here we attempt to classify cytoskeleton motor proteins using deep learning, which has been increasingly and widely used to address numerous problems in a variety of fields resulting in state-of-the-art results. Our effective deep convolutional neural network is able to achieve an independent test accuracy of 97.5%, 96.4%, and 96.1% for each superfamily, respectively. Compared to other state-of-the-art methods, our approach showed a significant improvement in performance across a range of evaluation metrics. Through the proposed study, we provide an effective model for classifying motor proteins and a basis for further research that can enhance the performance of protein function classification using deep learning., (Copyright © 2019 Elsevier Inc. All rights reserved.)
- Published
- 2019
- Full Text
- View/download PDF
37. DeepIon: Deep learning approach for classifying ion transporters and ion channels from membrane proteins.
- Author
-
Taju SW and Ou YY
- Subjects
- Automation, Humans, Ion Transport, Deep Learning, Ion Channels chemistry, Ion Channels classification
- Abstract
The movement of ions across the cell membrane is an essential for many biological processes. This study is focused on ion channels and ion transporters (pumps) as types of border guards control the incessant traffic of ions across cell membranes. Ion channels and ion transporters function to regulate membrane potential and electrical signaling and play important roles in cell proliferation, migration, apoptosis, and differentiation. In their behaviors, it is found that ion channels differ significantly from ion transporters. Therefore, a method for automatically classifying ion transporters and ion channels from membrane proteins is proposed by training deep neural networks and using the position-specific scoring matrix profile as an input. The key of novelty is the three-stage approach, in which five techniques for data normalization are used; next three imbalanced data techniques are applied to the minority classes and then, six classifiers are compared with the proposed method. © 2019 Wiley Periodicals, Inc., (© 2019 Wiley Periodicals, Inc.)
- Published
- 2019
- Full Text
- View/download PDF
38. iEnhancer-5Step: Identifying enhancers using hidden information of DNA sequences via Chou's 5-step rule and word embedding.
- Author
-
Le NQK, Yapp EKY, Ho QT, Nagasundaram N, Ou YY, and Yeh HY
- Subjects
- Humans, Sequence Analysis, DNA, Computational Biology, DNA genetics, Enhancer Elements, Genetic genetics, Support Vector Machine
- Abstract
An enhancer is a short (50-1500bp) region of DNA that plays an important role in gene expression and the production of RNA and proteins. Genetic variation in enhancers has been linked to many human diseases, such as cancer, disorder or inflammatory bowel disease. Due to the importance of enhancers in genomics, the classification of enhancers has become a popular area of research in computational biology. Despite the few computational tools employed to address this problem, their resulting performance still requires improvements. In this study, we treat enhancers by the word embeddings, including sub-word information of its biological words, which then serve as features to be fed into a support vector machine algorithm to classify them. We present iEnhancer-5Step, a web server containing two-layer classifiers to identify enhancers and their strength. We are able to attain an independent test accuracy of 79% and 63.5% in the two layers, respectively. Compared to current predictors on the same dataset, our proposed method is able to yield superior performance as compared to the other methods. Moreover, this study provides a basis for further research that can enrich the field of applying natural language processing techniques in biological sequences. iEnhancer-5Step is freely accessible via http://biologydeep.com/fastenc/., (Copyright © 2019 Elsevier Inc. All rights reserved.)
- Published
- 2019
- Full Text
- View/download PDF
39. Using two-dimensional convolutional neural networks for identifying GTP binding sites in Rab proteins.
- Author
-
Le NQK, Ho QT, and Ou YY
- Subjects
- Amino Acid Sequence, Amino Acids analysis, Binding Sites, Computational Biology methods, Databases, Protein statistics & numerical data, Deep Learning, Humans, rab GTP-Binding Proteins genetics, Guanosine Triphosphate metabolism, Neural Networks, Computer, rab GTP-Binding Proteins chemistry, rab GTP-Binding Proteins metabolism
- Abstract
Deep learning has been increasingly and widely used to solve numerous problems in various fields with state-of-the-art performance. It can also be applied in bioinformatics to reduce the requirement for feature extraction and reach high performance. This study attempts to use deep learning to predict GTP binding sites in Rab proteins, which is one of the most vital molecular functions in life science. A functional loss of GTP binding sites in Rab proteins has been implicated in a variety of human diseases (choroideremia, intellectual disability, cancer, Parkinson's disease). Therefore, creating a precise model to identify their functions is a crucial problem for understanding these diseases and designing the drug targets. Our deep learning model with two-dimensional convolutional neural network and position-specific scoring matrix profiles could identify GTP binding residues with achieved sensitivity of 92.3%, specificity of 99.8%, accuracy of 99.5%, and MCC of 0.92 for independent dataset. Compared with other published works, this approach achieved a significant improvement. Throughout the proposed study, we provide an effective model for predicting GTP binding sites in Rab proteins and a basis for further research that can apply deep learning in bioinformatics, especially in nucleotide binding site prediction.
- Published
- 2019
- Full Text
- View/download PDF
40. Incorporating post translational modification information for enhancing the predictive performance of membrane transport proteins.
- Author
-
Le NQK, Sandag GA, and Ou YY
- Subjects
- Amino Acid Sequence, Animals, Computer Simulation, Humans, Membrane Transport Proteins chemistry, Membrane Transport Proteins classification, Models, Biological, Neural Networks, Computer, Membrane Transport Proteins metabolism, Protein Processing, Post-Translational
- Abstract
Transporters involved in the cellular entry and exit of ions or molecules throughout the membrane proteins and thereby play an essential role in recognizing the immune system and energy transducers. According to their relevance in proteomics, numerous studies have been conducted to analyze the transporters; especially the discrimination of their classes and subfamilies. We realized that post translational modification information had a critical role in the process of transport proteins. Therefore, in this study, we aim to incorporate post translational information with radial basis function networks to improve the predictive performance of transport proteins in major classes (channels/pores, electrochemical transporters, and active transporters) and six different families (α-type channels, β-barrel porins, pore-forming toxins, porters, PP bond hydrolysis-driven transporters, and oxidoreduction-driven transporters). The experiment results by using PSSM profiles combined with PTM information could classify the transporters into three classes and six families with five-fold cross-validation accuracy of 87.6% and 92.5%, respectively. For the independent dataset of 444 proteins, the performance with post translational modification attained the accuracy of 82.13% and 89.34% for classifying three classes and six families, respectively. Compared with the other methods and previous works, our result shows that the predictive performance is better with the accuracy improvement by 12%. We suggest that our study could become a robust model for biologists to discriminate transport proteins with high performance and understand better the function of transport proteins. Further, the contributions of this study could be fundamental for further research that can use PTM information to enhance numerous computational biology problems., (Copyright © 2018 Elsevier Ltd. All rights reserved.)
- Published
- 2018
- Full Text
- View/download PDF
41. DeepEfflux: a 2D convolutional neural network model for identifying families of efflux proteins in transporters.
- Author
-
Taju SW, Nguyen TT, Le NQ, Kusuma RMI, and Ou YY
- Subjects
- Algorithms, Machine Learning, Neural Networks, Computer, Position-Specific Scoring Matrices, Protein Transport, Proteins metabolism, Software, Proteins chemistry
- Abstract
Motivation: Efflux protein plays a key role in pumping xenobiotics out of the cells. The prediction of efflux family proteins involved in transport process of compounds is crucial for understanding family structures, functions and energy dependencies. Many methods have been proposed to classify efflux pump transporters without considerations of any pump specific of efflux protein families. In other words, efflux proteins protect cells from extrusion of foreign chemicals. Moreover, almost all efflux protein families have the same structure based on the analysis of significant motifs. The motif sequences consisting of the same amount of residues will have high degrees of residue similarity and thus will affect the classification process. Consequently, it is challenging but vital to recognize the structures and determine energy dependencies of efflux protein families. In order to efficiently identify efflux protein families with considering about pump specific, we developed a 2 D convolutional neural network (2 D CNN) model called DeepEfflux. DeepEfflux tried to capture the motifs of sequences around hidden target residues to use as hidden features of families. In addition, the 2 D CNN model uses a position-specific scoring matrix (PSSM) as an input. Three different datasets, each for one family of efflux protein, was fed into DeepEfflux, and then a 5-fold cross validation approach was used to evaluate the training performance., Results: The model evaluation results show that DeepEfflux outperforms traditional machine learning algorithms. Furthermore, the accuracy of 96.02%, 94.89% and 90.34% for classes A, B and C, respectively, in the independent test results show that our model can perform well and can be used as a reliable tool for identifying families of efflux proteins in transporters., Availability and Implementation: The online version of deepefflux is available at http://deepefflux.irit.fr. The source code of deepefflux is available both on the deepefflux website and at http://140.138.155.216/deepefflux/., Supplementary Information: Supplementary data are available at Bioinformatics online.
- Published
- 2018
- Full Text
- View/download PDF
42. Classifying the molecular functions of Rab GTPases in membrane trafficking using deep convolutional neural networks.
- Author
-
Le NQ, Ho QT, and Ou YY
- Subjects
- Humans, Protein Transport, Cell Membrane metabolism, Choroideremia metabolism, Intellectual Disability metabolism, Machine Learning, Models, Biological, Neoplasm Proteins metabolism, Neoplasms metabolism, Neural Networks, Computer, rab GTP-Binding Proteins metabolism
- Abstract
Deep learning has been increasingly used to solve a number of problems with state-of-the-art performance in a wide variety of fields. In biology, deep learning can be applied to reduce feature extraction time and achieve high levels of performance. In our present work, we apply deep learning via two-dimensional convolutional neural networks and position-specific scoring matrices to classify Rab protein molecules, which are main regulators in membrane trafficking for transferring proteins and other macromolecules throughout the cell. The functional loss of specific Rab molecular functions has been implicated in a variety of human diseases, e.g., choroideremia, intellectual disabilities, cancer. Therefore, creating a precise model for classifying Rabs is crucial in helping biologists understand the molecular functions of Rabs and design drug targets according to such specific human disease information. We constructed a robust deep neural network for classifying Rabs that achieved an accuracy of 99%, 99.5%, 96.3%, and 97.6% for each of four specific molecular functions. Our approach demonstrates superior performance to traditional artificial neural networks. Therefore, from our proposed study, we provide both an effective tool for classifying Rab proteins and a basis for further research that can improve the performance of biological modeling using deep neural networks., (Copyright © 2018 Elsevier Inc. All rights reserved.)
- Published
- 2018
- Full Text
- View/download PDF
43. Incorporating deep learning with convolutional neural networks and position specific scoring matrices for identifying electron transport proteins.
- Author
-
Le NQ, Ho QT, and Ou YY
- Abstract
In several years, deep learning is a modern machine learning technique using in a variety of fields with state-of-the-art performance. Therefore, utilization of deep learning to enhance performance is also an important solution for current bioinformatics field. In this study, we try to use deep learning via convolutional neural networks and position specific scoring matrices to identify electron transport proteins, which is an important molecular function in transmembrane proteins. Our deep learning method can approach a precise model for identifying of electron transport proteins with achieved sensitivity of 80.3%, specificity of 94.4%, and accuracy of 92.3%, with MCC of 0.71 for independent dataset. The proposed technique can serve as a powerful tool for identifying electron transport proteins and can help biologists understand the function of the electron transport proteins. Moreover, this study provides a basis for further research that can enrich a field of applying deep learning in bioinformatics. © 2017 Wiley Periodicals, Inc., (© 2017 Wiley Periodicals, Inc.)
- Published
- 2017
- Full Text
- View/download PDF
44. Polymyxin B as an inhibitor of lipopolysaccharides contamination of herb crude polysaccharides in mononuclear cells.
- Author
-
Lu XX, Jiang YF, Li H, Ou YY, Zhang ZD, DI HY, Chen DF, and Zhang YY
- Subjects
- Animals, Bupleurum chemistry, Drug Contamination, Drugs, Chinese Herbal analysis, Lipopolysaccharides analysis, Macrophages metabolism, Mice, Nitric Oxide metabolism, Polymyxin B pharmacology, Polysaccharides analysis, Tumor Necrosis Factor-alpha metabolism, Drugs, Chinese Herbal pharmacology, Lipopolysaccharides antagonists & inhibitors, Macrophages drug effects, Polymyxin B analysis, Polysaccharides pharmacology
- Abstract
Lipopolysaccharides (LPS) contamination in herbal crude polysaccharides is inevitable. The present study was performed to explore the effect of polymyxin B on abolishing the influence of LPS contamination in mononuclear cells. LPS was pretreated with polymyxin B sulfate (PB) at different concentrations for 1, 5 or 24 h, and then used to stimulate RAW264.7 and mouse peritoneal macrophages (MPMs). The nitric oxide (NO) and tumor necrosis factor-α (TNF-α) in cell culture supernatant, as the indications of cell response, were assayed. Bupleurum chinensis polysaccharides (BCPs) with trace amount contamination of LPS was treated with PB. 30 μg·mL
-1 of PB, treating LPS (10 and 1 000 ng·mL-1 in stimulating RAW264.7 and MPMs respectively) at 37 °C for 24 h, successfully abolished the stimulating effect of LPS on the cells. When the cells were stimulated with LPS, BCPs further promoted NO production. However, pretreated with PB, BCPs showed a suppression of NO production in MPMs and no change in RAW264.7. In the in vitro experiments, LPS contamination in polysaccharide might bring a great interference in assessing the activity of drug. Pretreatment with PB (30 μg·mL-1 ) at 37 °C for 24 h was sufficient to abolish the effects of LPS contamination (10 and 1 000 ng·mL-1 )., (Copyright © 2017 China Pharmaceutical University. Published by Elsevier B.V. All rights reserved.)- Published
- 2017
- Full Text
- View/download PDF
45. Identifying the molecular functions of electron transport proteins using radial basis function networks and biochemical properties.
- Author
-
Le NQ, Nguyen TT, and Ou YY
- Subjects
- Algorithms, Amino Acids chemistry, Electron Transport, Carrier Proteins metabolism
- Abstract
The electron transport proteins have an important role in storing and transferring electrons in cellular respiration, which is the most proficient process through which cells gather energy from consumed food. According to the molecular functions, the electron transport chain components could be formed with five complexes with several different electron carriers and functions. Therefore, identifying the molecular functions in the electron transport chain is vital for helping biologists understand the electron transport chain process and energy production in cells. This work includes two phases for discriminating electron transport proteins from transport proteins and classifying categories of five complexes in electron transport proteins. In the first phase, the performances from PSSM with AAIndex feature set were successful in identifying electron transport proteins in transport proteins with achieved sensitivity of 73.2%, specificity of 94.1%, and accuracy of 91.3%, with MCC of 0.64 for independent data set. With the second phase, our method can approach a precise model for identifying of five complexes with different molecular functions in electron transport proteins. The PSSM with AAIndex properties in five complexes achieved MCC of 0.51, 0.47, 0.42, 0.74, and 1.00 for independent data set, respectively. We suggest that our study could be a power model for determining new proteins that belongs into which molecular function of electron transport proteins., (Copyright © 2017 Elsevier Inc. All rights reserved.)
- Published
- 2017
- Full Text
- View/download PDF
46. Polysaccharides from Arnebia euchroma Ameliorated Endotoxic Fever and Acute Lung Injury in Rats Through Inhibiting Complement System.
- Author
-
Ou YY, Jiang Y, Li H, Zhang YY, Lu Y, and Chen DF
- Subjects
- Acute Lung Injury chemically induced, Animals, Complement Inactivating Agents pharmacology, Complement System Proteins drug effects, Fever chemically induced, Lipopolysaccharides, Medicine, Chinese Traditional methods, Phytotherapy methods, Polysaccharides administration & dosage, Rats, Acute Lung Injury drug therapy, Boraginaceae chemistry, Complement Activation drug effects, Fever drug therapy, Polysaccharides therapeutic use
- Abstract
Arnebiaeuchroma (Royle) Johnst (Ruanzicao) is a traditional Chinese herbal medicine (TCM). It is extensively used in China and other countries for treatment of inflammatory diseases. It is known that hyper-activated complement system involves in the fever and acute lung injury (ALI) in rats. In our preliminary studies, anti-complementary activity of crude Arnebiaeuchroma polysaccharides (CAEP) had been demonstrated in vitro. This study aimed to investigate the role and mechanism of crude Arnebiaeuchroma polysaccharides (CAEP) using two animal models, which relate with inappropriate activation of complement system. In lipopolysaccharide (LPS)-induced fever model, the body temperature and leukocytes of peripheral blood in rats were significantly increased, while the complement levels of serum were remarkably decreased. CAEP administration alleviated the LPS-induced fever, reduced the number of leukocytes, and improved the levels of complement. Histological assay showed that there were severe damages and complement depositions in lung of the ALI rats. Further detection displayed that the oxidant stress was enhanced, and total hemolytic activity and C3/C4 levels in serum were decreased significantly in the ALI model group. Remarkably, CAEP not only attenuated the morphological injury, edema, and permeability in the lung but also significantly weakened the oxidant stress in bronchoalveolar lavage fluid (BALF) in the ALI rats. The levels of complement and complement depositions were improved by the CAEP treatment. In conclusion, the CAEP treatment ameliorated febrile response induced by LPS and acute lung injury induced by LPS plus ischemia-reperfusion. CAEP exerted beneficial effects on inflammatory disease potentially via inhibiting the inappropriate activation of complement system.
- Published
- 2017
- Full Text
- View/download PDF
47. Incorporating efficient radial basis function networks and significant amino acid pairs for predicting GTP binding sites in transport proteins.
- Author
-
Le NQ and Ou YY
- Subjects
- Amino Acid Sequence, Amino Acids chemistry, Binding Sites, Carrier Proteins chemistry, Guanosine Triphosphate chemistry, Humans, Protein Binding, ROC Curve, Amino Acids metabolism, Carrier Proteins metabolism, Guanosine Triphosphate metabolism, Models, Theoretical
- Abstract
Background: Guanonine-protein (G-protein) is known as molecular switches inside cells, and is very important in signals transmission from outside to inside cell. Especially in transport protein, most of G-proteins play an important role in membrane trafficking; necessary for transferring proteins and other molecules to a variety of destinations outside and inside of the cell. The function of membrane trafficking is controlled by G-proteins via Guanosine triphosphate (GTP) binding sites. The GTP binding sites active G-proteins initiated to membrane vesicles by interacting with specific effector proteins. Without the interaction from GTP binding sites, G-proteins could not be active in membrane trafficking and consequently cause many diseases, i.e., cancer, Parkinson… Thus it is very important to identify GTP binding sites in membrane trafficking, in particular, and in transport protein, in general., Results: We developed the proposed model with a cross-validation and examined with an independent dataset. We achieved an accuracy of 95.6% for evaluating with cross-validation and 98.7% for examining the performance with the independent data set. For newly discovered transport protein sequences, our approach performed remarkably better than similar methods such as GTPBinder, NsitePred and TargetSOS. Moreover, a friendly web server was developed for identifying GTP binding sites in transport proteins available for all users., Conclusions: We approached a computational technique using PSSM profiles and SAAPs for identifying GTP binding residues in transport proteins. When we included SAAPs into PSSM profiles, the predictive performance achieved a significant improvement in all measurement metrics. Furthermore, the proposed method could be a power tool for determining new proteins that belongs into GTP binding sites in transport proteins and can provide useful information for biologists.
- Published
- 2016
- Full Text
- View/download PDF
48. Prediction of FAD binding sites in electron transport proteins according to efficient radial basis function networks and significant amino acid pairs.
- Author
-
Le NQ and Ou YY
- Subjects
- Algorithms, Amino Acid Sequence, Amino Acids chemistry, Area Under Curve, Binding Sites, Electron Transport Chain Complex Proteins chemistry, Flavin-Adenine Dinucleotide chemistry, Internet, Protein Binding, ROC Curve, User-Computer Interface, Electron Transport Chain Complex Proteins metabolism, Flavin-Adenine Dinucleotide metabolism
- Abstract
Background: Cellular respiration is a catabolic pathway for producing adenosine triphosphate (ATP) and is the most efficient process through which cells harvest energy from consumed food. When cells undergo cellular respiration, they require a pathway to keep and transfer electrons (i.e., the electron transport chain). Due to oxidation-reduction reactions, the electron transport chain produces a transmembrane proton electrochemical gradient. In case protons flow back through this membrane, this mechanical energy is converted into chemical energy by ATP synthase. The convert process is involved in producing ATP which provides energy in a lot of cellular processes. In the electron transport chain process, flavin adenine dinucleotide (FAD) is one of the most vital molecules for carrying and transferring electrons. Therefore, predicting FAD binding sites in the electron transport chain is vital for helping biologists understand the electron transport chain process and energy production in cells., Results: We used an independent data set to evaluate the performance of the proposed method, which had an accuracy of 69.84 %. We compared the performance of the proposed method in analyzing two newly discovered electron transport protein sequences with that of the general FAD binding predictor presented by Mishra and Raghava and determined that the accuracy of the proposed method improved by 9-45 % and its Matthew's correlation coefficient was 0.14-0.5. Furthermore, the proposed method enabled reducing the number of false positives significantly and can provide useful information for biologists., Conclusions: We developed a method that is based on PSSM profiles and SAAPs for identifying FAD binding sites in newly discovered electron transport protein sequences. This approach achieved a significant improvement after we added SAAPs to PSSM features to analyze FAD binding proteins in the electron transport chain. The proposed method can serve as an effective tool for predicting FAD binding sites in electron transport proteins and can help biologists understand the functions of the electron transport chain, particularly those of FAD binding sites. We also developed a web server which identifies FAD binding sites in electron transporters available for academics.
- Published
- 2016
- Full Text
- View/download PDF
49. Complete mitochondrial genome of Papilio syfanius (Lepidoptera: Papilionidae).
- Author
-
Dong Y, Zhu LX, Ding MJ, Wang JJ, Luo LG, Liu Y, and Ou YY
- Subjects
- Animals, Base Sequence, DNA, Mitochondrial genetics, Genome, Insect, Molecular Sequence Data, Sequence Analysis, DNA veterinary, Butterflies genetics, Genome, Mitochondrial
- Abstract
The complete mitochondrial genome (mitogenome) of the swallowtail butterfly Papilio syfanius has been completed. It is 15,359 bp, and contains the typical complement of 13 protein-coding (PCGs), 22 transfer RNA (tRNA) and 2 ribosomal RNA (rRNA) genes. Two A + T-rich regions are included in this mitogenome. The nucleotide composition is very similar to other insects, showing a high bias towards A + T, especially the control region (92.8%). Gene order in P. syfanius mitogenome is basically identical to that of the inferred ancestral insect genome, with the exception of translocations of trnM, which is common in genus Papilio.
- Published
- 2016
- Full Text
- View/download PDF
50. Houttuyniacordata Thunb. polysaccharides ameliorates lipopolysaccharide-induced acute lung injury in mice.
- Author
-
Xu YY, Zhang YY, Ou YY, Lu XX, Pan LY, Li H, Lu Y, and Chen DF
- Subjects
- Acute Lung Injury chemically induced, Acute Lung Injury immunology, Acute Lung Injury pathology, Animals, Bronchoalveolar Lavage Fluid cytology, Bronchoalveolar Lavage Fluid immunology, Cell Count, Chemotaxis, Complement System Proteins immunology, Cytokines immunology, Lung drug effects, Lung immunology, Lung pathology, Macrophages physiology, Male, Mice, Inbred BALB C, Phytotherapy, Polysaccharides pharmacology, Toll-Like Receptor 4 immunology, Acute Lung Injury drug therapy, Houttuynia, Polysaccharides therapeutic use
- Abstract
Ethnopharmacological Relevance: Houttuynia cordata (HC) has been used as a folk therapy to treat pulmonary infections. This study aimed to determine the role and mechanism of action of polysaccharides isolated from HC (HCP) in lipopolysaccharide (LPS)-induced ALI in the mice., Materials and Methods: LPS was delivered by the intratracheal route to Balb/c mice 2h before HCP (40, 80 and 160mg/kg) administration., Results: The number of total cells, protein and tumor necrosis factor-α (TNF-α) concentrations in bronchoalveolar lavage fluid, the wet/dry weight ratio (w/d) of lungs and pulmonary pathology of each mouse were analyzed, it was found that HCP significantly alleviated ALI induced by LPS. Moreover, in lungs of mice, it was found that the infiltration of inflammatory cells, the expression of Toll-like receptor 4 and complement deposition were significantly decreased by HCP treatment. In vitro assays showed that C5a, a complement activation product, induced significant macrophage migration and treatment with HCP prevented it. The in vitro results also proved that LPS increased nitric oxide and pro-inflammatory cytokines (TNF-α, interleukin-6, and interleukin-1β) production, and HCP antagonized these effects of LPS. It was also found that HCP alone augmented secretion of some pro-inflammatory cytokines., Conclusion: These results indicate that HCP may alleviate LPS induced lung inflammatory injury, which may be associated with its inhibitory effect on the over activation of complement and macrophages. This suggests a potential role to treat ALI., (Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.)
- Published
- 2015
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.