9,164 results on '"protein structure prediction"'
Search Results
2. Machine learning for predicting protein properties: A comprehensive review
- Author
-
Wang, Yizhen, Zhang, Yanyun, Zhan, Xuhui, He, Yuhao, Yang, Yongfu, Cheng, Li, and Alghazzawi, Daniyal
- Published
- 2024
- Full Text
- View/download PDF
3. Unveiling the evolution of policies for enhancing protein structure predictions: A comprehensive analysis
- Author
-
Rahimzadeh, Faezeh, Mohammad Khanli, Leyli, Salehpoor, Pedram, Golabi, Faegheh, and PourBahrami, Shahin
- Published
- 2024
- Full Text
- View/download PDF
4. AlphaFold-assisted structure determination of a bacterial protein of unknown function using X-ray and electron crystallography.
- Author
-
Miller, Justin, Agdanowski, Matthew, Dolinsky, Joshua, Sawaya, Michael, Yeates, Todd, Rodriguez, Jose, and Cascio, Duilio
- Subjects
AlphaFold ,bacterial proteins ,electron diffraction ,molecular replacement ,protein structure prediction ,Bacterial Proteins ,X-Rays ,Electrons ,Protein Conformation ,Crystallography ,X-Ray - Abstract
Macromolecular crystallography generally requires the recovery of missing phase information from diffraction data to reconstruct an electron-density map of the crystallized molecule. Most recent structures have been solved using molecular replacement as a phasing method, requiring an a priori structure that is closely related to the target protein to serve as a search model; when no such search model exists, molecular replacement is not possible. New advances in computational machine-learning methods, however, have resulted in major advances in protein structure predictions from sequence information. Methods that generate predicted structural models of sufficient accuracy provide a powerful approach to molecular replacement. Taking advantage of these advances, AlphaFold predictions were applied to enable structure determination of a bacterial protein of unknown function (UniProtKB Q63NT7, NCBI locus BPSS0212) based on diffraction data that had evaded phasing attempts using MIR and anomalous scattering methods. Using both X-ray and micro-electron (microED) diffraction data, it was possible to solve the structure of the main fragment of the protein using a predicted model of that domain as a starting point. The use of predicted structural models importantly expands the promise of electron diffraction, where structure determination relies critically on molecular replacement.
- Published
- 2024
5. Large language models facilitating modern molecular biology and novel drug development.
- Author
-
Liu, Xiao-huan, Lu, Zhen-hua, Wang, Tao, and Liu, Fei
- Abstract
The latest breakthroughs in information technology and biotechnology have catalyzed a revolutionary shift within the modern healthcare landscape, with notable impacts from artificial intelligence (AI) and deep learning (DL). Particularly noteworthy is the adept application of large language models (LLMs), which enable seamless and efficient communication between scientific researchers and AI systems. These models capitalize on neural network (NN) architectures that demonstrate proficiency in natural language processing, thereby enhancing interactions. This comprehensive review outlines the cutting-edge advancements in the application of LLMs within the pharmaceutical industry, particularly in drug development. It offers a detailed exploration of the core mechanisms that drive these models and zeroes in on the practical applications of several models that show great promise in this domain. Additionally, this review delves into the pivotal technical and ethical challenges that arise with the practical implementation of LLMs. There is an expectation that LLMs will assume a more pivotal role in the development of innovative drugs and will ultimately contribute to the accelerated development of revolutionary pharmaceuticals. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
6. Porter 6: Protein Secondary Structure Prediction by Leveraging Pre-Trained Language Models (PLMs).
- Author
-
Alanazi, Wafa, Meng, Di, and Pollastri, Gianluca
- Abstract
Accurately predicting protein secondary structure (PSSP) is crucial for understanding protein function, which is foundational to advancements in drug development, disease treatment, and biotechnology. Researchers gain critical insights into protein folding and function within cells by predicting protein secondary structures. The advent of deep learning models, capable of processing complex sequence data and identifying meaningful patterns, offer substantial potential to enhance the accuracy and efficiency of protein structure predictions. In particular, recent breakthroughs in deep learning—driven by the integration of natural language processing (NLP) algorithms—have significantly advanced the field of protein research. Inspired by the remarkable success of NLP techniques, this study harnesses the power of pre-trained language models (PLMs) to advance PSSP prediction. We conduct a comprehensive evaluation of various deep learning models trained on distinct sequence embeddings, including one-hot encoding and PLM-based approaches such as ProtTrans and ESM-2, to develop a cutting-edge prediction system optimized for accuracy and computational efficiency. Our proposed model, Porter 6, is an ensemble of CBRNN-based predictors, leveraging the protein language model ESM-2 as input features. Porter 6 achieves outstanding performance on large-scale, independent test sets. On a 2022 test set, the model attains an impressive 86.60% accuracy in three-state (Q3) and 76.43% in eight-state (Q8) classifications. When tested on a more recent 2024 test set, Porter 6 maintains robust performance, achieving 84.56% in Q3 and 74.18% in Q8 classifications. This represents a significant 3% improvement over its predecessor, outperforming or matching state-of-the-art approaches in the field. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
7. PaleAle 6.0: Prediction of Protein Relative Solvent Accessibility by Leveraging Pre-Trained Language Models (PLMs).
- Author
-
Alanazi, Wafa, Meng, Di, and Pollastri, Gianluca
- Abstract
Predicting the relative solvent accessibility (RSA) of a protein is critical to understanding its 3D structure and biological function. RSA prediction, especially when homology transfer cannot provide information about a protein's structure, is a significant step toward addressing the protein structure prediction challenge. Today, deep learning is arguably the most powerful method for predicting RSA and other structural features of proteins. In particular, recent breakthroughs in deep learning—driven by the integration of natural language processing (NLP) algorithms—have significantly advanced the field of protein research. Inspired by the remarkable success of NLP techniques, this study leverages pre-trained language models (PLMs) to enhance RSA prediction. We present a deep neural network architecture based on a combination of bidirectional recurrent neural networks and convolutional layers that can analyze long-range interactions within protein sequences and predict protein RSA using ESM-2 encoding. The final predictor, PaleAle 6.0, predicts RSA in real values as well as two-state (exposure threshold of 25%) and four-state (exposure thresholds of 4%, 25%, and 50%) discrete classifications. On the 2022 test set dataset, PaleAle 6.0 achieved over 82% accuracy for two-state RSA (RSA_2C) and 59.75% accuracy for four-state RSA (RSA_4C), with a Pearson correlation coefficient (PCC) of 77.88 for real-value RSA prediction. When evaluated on the more challenging 2024 test set, PaleAle 6.0 maintained a strong performance, achieving 79.74% accuracy in the two-state prediction and 55.30% accuracy in the four-state prediction, with a PCC of 73.08 for real-value predictions, outperforming all previously benchmarked predictors. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
8. New insights into the evolution and function of the UMAMIT (USUALLYMULTIPLEACIDSMOVEINANDOUTTRANSPORTER) gene family.
- Author
-
Cao, Chenhao, Qiu, Xinbao, Yang, Zhongnan, and Jin, Yue
- Subjects
- *
AMINO acid transport , *PROTEIN structure prediction , *LIFE sciences , *CYTOLOGY , *DISEASE resistance of plants - Abstract
UMAMIT proteins have been known as key players in amino acid transport. In Arabidopsis, functions of several UMAMITs have been characterized, but their precise mechanism, evolutionary history and functional divergence remain elusive. In this study, we conducted phylogenetic analysis of the UMAMIT gene family across key species in the evolutionary history of plants, ranging from algae to angiosperms. Our findings indicate that UMAMIT proteins underwent a substantial expansion from algae to angiosperms, accompanied by the stabilization of the EamA (the main domain of UMAMIT) structure. Phylogenetic studies suggest that UMAMITs may have originated from green algae and be divided into four subfamilies. These proteins first diversified in bryophytes and subsequently experienced gene duplication events in seed plants. Subfamily I was potentially associated with amino acid transport in seeds. Regarding subcellular localization, UMAMITs were predominantly localized in the plasma membrane and chloroplasts. However, members from clade 8 in subfamily III exhibited specific localization in the tonoplast. These members may have multiple functions, such as plant disease resistance and root development. Furthermore, our protein structure prediction revealed that the four-helix bundle motif is crucial in controlling the UMAMIT switch for exporting amino acid. We hypothesize that the specific amino acids in the amino acid binding region determine the type of amino acids being transported. Additionally, subfamily II contains genes that are specifically expressed in reproductive organs and roots in angiosperms, suggesting neofunctionalization. Our study highlights the evolutionary complexity of UMAMITs and underscores their crucial role in the adaptation and diversification of seed plants. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
9. AI-driven mechanistic analysis of conformational dynamics in CNNM/CorC Mg2+ transporters.
- Author
-
Ma, Jie, Song, Xingyu, Funato, Yosuke, Teng, Xinyu, Huang, Yichen, Miki, Hiroaki, Wang, Wenning, and Hattori, Motoyuki
- Subjects
- *
PROTEIN structure prediction , *HYDROPHILIC interactions , *MOLECULAR dynamics , *TRANSMEMBRANE domains , *CONFORMATIONAL analysis - Abstract
The CNNM/CorC Mg2+ transporters are widely conserved in eukaryotes (cyclin M [CNNM]) and prokaryotes (CorC) and participate in various biological processes. Previous structural analyses of the CorC transmembrane domain in the Mg2+-bound inward-facing conformation revealed the conserved Mg2+ recognition mechanism in the CNNM/CorC family; however, the conformational dynamics in the Mg2+ transport cycle remain unclear because structures in other conformations are unknown. Here, we used AlphaFold structure prediction to predict the occluded-like and outward-facing-like conformations of the CorC and CNNM proteins and identified conserved hydrophilic interactions close to the cytoplasmic side in these conformations. Molecular dynamics simulations and biochemical cross-linking showed that these conserved hydrophilic interactions are stable, especially in the outward-facing-like conformation. Furthermore, mutational analysis revealed that the residues involved in these hydrophilic interactions on the cytoplasmic side are important for Mg2+ transport in the CorC and CNNM proteins. Our work provides mechanistic insights into the transport cycle of the CNNM/CorC family. [Display omitted] • AlphaFold predicted the occluded and outward-facing conformations of CNNM and CorC • The conserved hydrophilic interactions close to the cytoplasmic side were predicted • The predicted structures were validated by MD simulations and functional analyses • A transport cycle mechanism for CNNM/CorC Mg2+ transporters was proposed The CNNM/CorC Mg2+ transporters are widely conserved in eukaryotes and prokaryotes and participate in various biological processes. Ma et al. used AlphaFold to predict their occluded-like and outward-facing-like conformations in the absence of known experimental structures. The predicted structures were subsequently validated by MD simulations and functional analyses. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
10. How the technologies behind self‐driving cars, social networks, ChatGPT, and DALL‐E2 are changing structural biology.
- Author
-
Bochtler, Matthias
- Subjects
- *
ARTIFICIAL neural networks , *LANGUAGE models , *PROTEIN structure prediction , *COMPUTER vision , *CONVOLUTIONAL neural networks , *DEEP learning - Abstract
The performance of deep Neural Networks (NNs) in the text (ChatGPT) and image (DALL‐E2) domains has attracted worldwide attention. Convolutional NNs (CNNs), Large Language Models (LLMs), Denoising Diffusion Probabilistic Models (DDPMs)/Noise Conditional Score Networks (NCSNs), and Graph NNs (GNNs) have impacted computer vision, language editing and translation, automated conversation, image generation, and social network management. Proteins can be viewed as texts written with the alphabet of amino acids, as images, or as graphs of interacting residues. Each of these perspectives suggests the use of tools from a different area of deep learning for protein structural biology. Here, I review how CNNs, LLMs, DDPMs/NCSNs, and GNNs have led to major advances in protein structure prediction, inverse folding, protein design, and small molecule design. This review is primarily intended as a deep learning primer for practicing experimental structural biologists. However, extensive references to the deep learning literature should also make it relevant to readers who have a background in machine learning, physics or statistics, and an interest in protein structural biology. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
11. Systematic benchmarking of deep-learning methods for tertiary RNA structure prediction.
- Author
-
Bahai, Akash, Kwoh, Chee Keong, Mu, Yuguang, and Li, Yinghui
- Subjects
- *
PROTEIN structure prediction , *TERTIARY structure , *SEQUENCE alignment , *MACHINE learning , *RNA , *DEEP learning - Abstract
The 3D structure of RNA critically influences its functionality, and understanding this structure is vital for deciphering RNA biology. Experimental methods for determining RNA structures are labour-intensive, expensive, and time-consuming. Computational approaches have emerged as valuable tools, leveraging physics-based-principles and machine learning to predict RNA structures rapidly. Despite advancements, the accuracy of computational methods remains modest, especially when compared to protein structure prediction. Deep learning methods, while successful in protein structure prediction, have shown some promise for RNA structure prediction as well, but face unique challenges. This study systematically benchmarks state-of-the-art deep learning methods for RNA structure prediction across diverse datasets. Our aim is to identify factors influencing performance variation, such as RNA family diversity, sequence length, RNA type, multiple sequence alignment (MSA) quality, and deep learning model architecture. We show that generally ML-based methods perform much better than non-ML methods on most RNA targets, although the performance difference isn't substantial when working with unseen novel or synthetic RNAs. The quality of the MSA and secondary structure prediction both play an important role and most methods aren't able to predict non-Watson-Crick pairs in the RNAs. Overall among the automated 3D RNA structure prediction methods, DeepFoldRNA has the best prediction followed by DRFold as the second best method. Finally, we also suggest possible mitigations to improve the quality of the prediction for future method development. Author summary: Systematic benchmarking of five latest deep-learning and two fragment-assembly based methods on diverse datasets Compiled a new balanced dataset with latest RNA structures for benchmarking Generally, the ML-based methods outperform the traditional fragment-assembly based methods with DeepFoldRNA having the best predicted models overall On orphan RNA's, the ML-based methods are only slightly better than FA-based methods, and generally all methods have poor performance on orphan RNAs. The performance of the methods is dependent on the MSA depth, RNA type, and secondary structure. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
12. ITGB3 is reduced in pregnancies with preeclampsia and its influence on biological behavior of trophoblast cells.
- Author
-
Li, Chunyan, Meng, Yanan, Zhou, Beibei, Zhang, Yanrong, Xia, Qing, Huang, Yu, Meng, Li, Shan, Chunjian, Xia, Jiaai, Zhang, Xiangdi, Wang, Qiuhong, Lv, Mingming, and Long, Wei
- Subjects
- *
CYTOLOGY , *LIFE sciences , *PROTEIN structure prediction , *PREGNANCY complications , *GENE expression , *PLACENTAL growth factor - Abstract
Background: Preeclampsia (PE) is a serious pregnancy complication associated with impaired trophoblast function. Integrin β3 (ITGB3) is a cell adhesion molecule that plays a role in cell movement. The objective of this study was to identify the biological function and expression level of ITGB3 in PE. Methods: Cell proliferation, migration, invasion, adhesion, and apoptosis were estimated by CCK8 assay, transwell, scratch assays, and flow cytometry, respectively. The expression levels of ITGB3 were determined by qRT-PCR, western blot, and immunohistochemistry (IHC). Co-immunoprecipitation and Alphafold-Multimer protein complex structure prediction software were employed to identify the molecules that interact with ITGB3. Results: Cell functional experiments conducted on HTR8/SVneo cells demonstrated that ITGB3 significantly enhanced proliferation, migration, invasion, and adhesion, while simultaneously inhibiting apoptosis. Relative ITGB3 expression levels were observed to be lower in PE placental tissue than in normal tissue and similarly reduced in hypoxic HTR8/SVneo cells. RNA-sequencing data from PE placental samples in the GEO database were analyzed to identify differentially expressed genes associated with the disease. We identified a total of 1460 mRNAs that were significantly differentially expressed in PE patients. Specifically, 798 mRNAs were significantly upregulated, and 662 mRNAs were significantly downregulated. Notably, the ITGB3 exhibited a pronounced down-regulation among the differential expression mRNA. Conclusions: This study suggested that ITGB3 plays an important role in promoting the proliferative, migratory, invasive, and adhesive capabilities of trophoblast cells. These findings may facilitate a more in-depth understanding of the molecular mechanisms that promote PE progression. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
13. Prostruc: an open-source tool for 3D structure prediction using homology modeling.
- Author
-
Pawar, Shivani V., Banini, Wilson Sena Kwaku, Shamsuddeen, Musa Muhammad, Jumah, Toheeb A., Dolling, Nigel N. O., Tiamiyu, Abdulwasiu, and Awe, Olaitan I.
- Subjects
- *
PROTEIN structure prediction , *PROTEIN structure , *AMINO acid sequence , *BANKING industry , *SEQUENCE alignment - Abstract
Introduction: Homology modeling is a widely used computational technique for predicting the three-dimensional (3D) structures of proteins based on known templates,evolutionary relationships to provide structural insights critical for understanding protein function, interactions, and potential therapeutic targets. However, existing tools often require significant expertise and computational resources, presenting a barrier for many researchers. Methods: Prostruc is a Python-based homology modeling tool designed to simplify protein structure prediction through an intuitive, automated pipeline. Integrating Biopython for sequence alignment, BLAST for template identification, and ProMod3 for structure generation, Prostruc streamlines complex workflows into a user-friendly interface. The tool enables researchers to input protein sequences, identify homologous templates from databases such as the Protein Data Bank (PDB), and generate high-quality 3D structures with minimal computational expertise. Prostruc implements a two-stage vSquarealidation process: first, it uses TM-align for structural comparison, assessing Root Mean Deviations (RMSD) and TM scores against reference models. Second, it evaluates model quality via QMEANDisCo to ensure high accuracy. Results: The top five models are selected based on these metrics and provided to the user. Prostruc stands out by offering scalability, flexibility, and ease of use. It is accessible via a cloud-based web interface or as a Python package for local use, ensuring adaptability across research environments. Benchmarking against existing tools like SWISS-MODEL,I-TASSER and Phyre2 demonstrates Prostruc's competitive performance in terms of structural accuracy and job runtime, while its open-source nature encourages community-driven innovation. Discussion: Prostruc is positioned as a significant advancement in homology modeling, making high-quality protein structure prediction more accessible to the scientific community. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. Identification, characterization, and design of plant genome sequences using deep learning.
- Author
-
Wang, Zhenye, Yuan, Hao, Yan, Jianbing, and Liu, Jianxiao
- Subjects
- *
LANGUAGE models , *GENERATIVE adversarial networks , *TRANSCRIPTION factors , *PROTEIN structure prediction , *PLANT genomes , *DEEP learning - Abstract
SUMMARY Due to its excellent performance in processing large amounts of data and capturing complex non‐linear relationships, deep learning has been widely applied in many fields of plant biology. Here we first review the application of deep learning in analyzing genome sequences to predict gene expression, chromatin interactions, and epigenetic features (open chromatin, transcription factor binding sites, and methylation sites) in plants. Then, current motif mining and functional component design and synthesis based on generative adversarial networks, large models, and attention mechanisms are elaborated in detail. The progress of protein structure and function prediction, genomic prediction, and large model applications based on deep learning is also discussed. Finally, this work provides prospects for the future development of deep learning in plants with regard to multiple omics data, algorithm optimization, large language models, sequence design, and intelligent breeding. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. A Structural Proteomics Exploration of Synphilin-1 and Alpha-Synuclein Interaction in Pathogenesis of Parkinson's Disease.
- Author
-
Tripathi, Asmita, Mondal, Rajkrishna, Mandal, Malay, Lahiri, Tapobrata, and Pal, Manoj Kumar
- Subjects
- *
PROTEIN structure prediction , *PARKINSON'S disease , *PROTEIN structure , *ALPHA-synuclein , *CELLULAR inclusions - Abstract
Pathological significance of interaction of Synphilin-1 with mutated alpha-synuclein is well known to have serious consequences in causing the formation of inclusion bodies that are linked to Parkinson's disease (PD). Information extracted so far pointed out that specific mutations, A53T, A30P, and E46K, in alpha-synuclein promote such interactions. However, a detailed structural study of this interaction is pending due to the unavailability of the complete structures of the large protein Synphilin-1 of chain length 919 residues and the mutated alpha-synuclein having all the reported specific mutations so far. In this study, a semi-automatic pipeline-based meta-predictor, AlphaLarge, is created to predict high-fidelity structures of large proteins like Synphilin-1 given the limitations of the existing protocols. AlphaLarge recruits a novel augmented AlphaFold model that uses a divide and conquer based strategy on the foundation of a self-sourced template dataset to choose the best structure model through their standard validations. The structure models were re-validated by a Protein Mediated Interaction Analysis (PMIA) formalism that uses the existing structurally relevant information of these proteins. For the training dataset, the new method, AlphaLarge, performed reasonably better than AlphaFold. Also, the new residue- and domain-based structural details of interactions of resultant best structure models of Synphilin-1 and both wild and mutated alpha-synuclein are extracted using PMIA. This result paves the way for better screening of target specific drugs to control the progression of PD, in particular, and research on any kind of pathophysiology involving large proteins of unknown structures, in general. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. The Historical Evolution and Significance of Multiple Sequence Alignment in Molecular Structure and Function Prediction.
- Author
-
Zhang, Chenyue, Wang, Qinxin, Li, Yiyang, Teng, Anqi, Hu, Gang, Wuyun, Qiqige, and Zheng, Wei
- Subjects
- *
PROTEIN structure prediction , *MOLECULAR structure , *LANGUAGE models , *LIFE sciences , *SEQUENCE alignment - Abstract
Multiple sequence alignment (MSA) has evolved into a fundamental tool in the biological sciences, playing a pivotal role in predicting molecular structures and functions. With broad applications in protein and nucleic acid modeling, MSAs continue to underpin advancements across a range of disciplines. MSAs are not only foundational for traditional sequence comparison techniques but also increasingly important in the context of artificial intelligence (AI)-driven advancements. Recent breakthroughs in AI, particularly in protein and nucleic acid structure prediction, rely heavily on the accuracy and efficiency of MSAs to enhance remote homology detection and guide spatial restraints. This review traces the historical evolution of MSA, highlighting its significance in molecular structure and function prediction. We cover the methodologies used for protein monomers, protein complexes, and RNA, while also exploring emerging AI-based alternatives, such as protein language models, as complementary or replacement approaches to traditional MSAs in application tasks. By discussing the strengths, limitations, and applications of these methods, this review aims to provide researchers with valuable insights into MSA's evolving role, equipping them to make informed decisions in structural prediction research. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
17. STRPsearch: fast detection of structured tandem repeat proteins.
- Author
-
Mozaffari, Soroush, Arrías, Paula Nazarena, Clementel, Damiano, Piovesan, Damiano, Ferrari, Carlo, Tosatto, Silvio C E, and Monzon, Alexander Miguel
- Subjects
- *
PROTEIN structure prediction , *TANDEM repeats , *CYTOSKELETAL proteins , *RAPID tooling , *PROTEIN models - Abstract
Motivation Structured Tandem Repeats Proteins (STRPs) constitute a subclass of tandem repeats characterized by repetitive structural motifs. These proteins exhibit distinct secondary structures that form repetitive tertiary arrangements, often resulting in large molecular assemblies. Despite highly variable sequences, STRPs can perform important and diverse biological functions, maintaining a consistent structure with a variable number of repeat units. With the advent of protein structure prediction methods, millions of 3D models of proteins are now publicly available. However, automatic detection of STRPs remains challenging with current state-of-the-art tools due to their lack of accuracy and long execution times, hindering their application on large datasets. In most cases, manual curation remains the most accurate method for detecting and classifying STRPs, making it impracticable to annotate millions of structures. Results We introduce STRPsearch, a novel tool for the rapid identification, classification, and mapping of STRPs. Leveraging manually curated entries from RepeatsDB as the known conformational space of STRPs, STRPsearch uses the latest advances in structural alignment for a fast and accurate detection of repeated structural motifs in proteins, followed by an innovative approach to map units and insertions through the generation of TM-score profiles. STRPsearch is highly scalable, efficiently processing large datasets, and can be applied to both experimental structures and predicted models. In addition, it demonstrates superior performance compared to existing tools, offering researchers a reliable and comprehensive solution for STRP analysis across diverse proteomes. Availability and implementation STRPsearch is coded in Python. All scripts and associated documentation are available from: https://github.com/BioComputingUP/STRPsearch. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
18. Improved prediction of post-translational modification crosstalk within proteins using DeepPCT.
- Author
-
Huang, Yu-Xiang and Liu, Rong
- Subjects
- *
GRAPH neural networks , *MACHINE learning , *PROTEIN structure prediction , *POST-translational modification , *DEEP learning , *RANDOM forest algorithms - Abstract
Motivation Post-translational modification (PTM) crosstalk events play critical roles in biological processes. Several machine learning methods have been developed to identify PTM crosstalk within proteins, but the accuracy is still far from satisfactory. Recent breakthroughs in deep learning and protein structure prediction could provide a potential solution to this issue. Results We proposed DeepPCT, a deep learning algorithm to identify PTM crosstalk using AlphaFold2-based structures. In this algorithm, one deep learning classifier was constructed for sequence-based prediction by combining the residue and residue pair embeddings with cross-attention techniques, while the other classifier was established for structure-based prediction by integrating the structural embedding and a graph neural network. Meanwhile, a machine learning classifier was developed using novel structural descriptors and a random forest model to complement the structural deep learning classifier. By integrating the three classifiers, DeepPCT outperformed existing algorithms in different evaluation scenarios and showed better generalizability on new data owing to its less distance dependency. Availability and implementation Datasets, codes, and models of DeepPCT are freely accessible at https://github.com/hzau-liulab/DeepPCT/. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
19. Mechanisms, Machinery, and Dynamics of Chromosome Segregation in Zea mays.
- Author
-
Duffy, Marissa E., Ngaw, Michael, Polsky, Shayna E., Marzec, Abby E., Zhang, Sean S., Dzierzgowski, Owen R., and Nannas, Natalie J.
- Subjects
- *
PROTEIN structure prediction , *CHROMOSOME segregation , *MEIOTIC drive , *SPINDLE apparatus , *CORN - Abstract
Zea mays (maize) is both an agronomically important crop and a powerful genetic model system with an extensive molecular toolkit and genomic resources. With these tools, maize is an optimal system for cytogenetic study, particularly in the investigation of chromosome segregation. Here, we review the advances made in maize chromosome segregation, specifically in the regulation and dynamic assembly of the mitotic and meiotic spindle, the inheritance and mechanisms of the abnormal chromosome variant Ab10, the regulation of chromosome–spindle interactions via the spindle assembly checkpoint, and the function of kinetochore proteins that bridge chromosomes and spindles. In this review, we discuss these processes in a species-specific context including features that are both conserved and unique to Z. mays. Additionally, we highlight new protein structure prediction tools and make use of these tools to identify several novel kinetochore and spindle assembly checkpoint proteins in Z. mays. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
20. Structural Insights into Cold-Active Lipase from Glaciozyma antarctica PI12: Alphafold2 Prediction and Molecular Dynamics Simulation.
- Author
-
Matinja, Adamu Idris, Kamarudin, Nor Hafizah Ahmad, Leow, Adam Thean Chor, Oslan, Siti Nurbaya, and Ali, Mohd Shukuri Mohamad
- Subjects
- *
PROTEIN structure prediction , *MOLECULAR dynamics , *COLD adaptation , *LOW temperatures , *CHEMICAL synthesis , *LIPASES - Abstract
Cold-active enzymes have recently gained popularity because of their high activity at lower temperatures than their mesophilic and thermophilic counterparts, enabling them to withstand harsh reaction conditions and enhance industrial processes. Cold-active lipases are enzymes produced by psychrophiles that live and thrive in extremely cold conditions. Cold-active lipase applications are now growing in the detergency, synthesis of fine chemicals, food processing, bioremediation, and pharmaceutical industries. The cold adaptation mechanisms exhibited by these enzymes are yet to be fully understood. Using phylogenetic analysis, and advanced deep learning-based protein structure prediction tool Alphafold2, we identified an evolutionary processes in which a conserved cold-active-like motif is presence in a distinct subclade of the tree and further predicted and simulated the three-dimensional structure of a putative cold-active lipase with the cold active motif, Glalip03, from Glaciozyma antarctica PI12. Molecular dynamics at low temperatures have revealed global stability over a wide range of temperatures, flexibility, and the ability to cope with changes in water and solvent entropy. Therefore, the knowledge we uncover here will be crucial for future research into how these low-temperature-adapted enzymes maintain their overall flexibility and function at lower temperatures. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
21. A self-adaptive evolutionary algorithm using Monte Carlo Fragment insertion and conformation clustering for the protein structure prediction problem.
- Author
-
Parpinelli, Rafael Stubs, Will, Nilcimar Neitzel, and da Silva, Renan Samuel
- Subjects
- *
PROTEIN structure prediction , *PROTEIN conformation , *EVOLUTIONARY algorithms , *PROTEIN structure , *STRUCTURAL bioinformatics - Abstract
The Protein Structure Prediction Problem is one of the most important and challenging open problems in Computer Science and Structural Bioinformatics. Accurately predicting protein conformations would significantly impact several fields, such as understanding proteinopathies and developing smart protein-based drugs. As such, this work has as its primary goal to improve the prediction power of ab initio methods by utilizing a self-adaptive evolutionary algorithm using Monte Carlo based fragment insertion and conformational clustering. A meta-heuristic is used as the core of the conformation sampling process with fragment insertion, feeding domain-specific information into the process. The online parameter control routines allow the method to adapt to a protein's structure specificity and behave dynamically in different stages of the optimization process. The results obtained by the proposed method were compared to results obtained from several other algorithms found in the literature. It is possible to conclude that the proposed method is highly competitive in terms of free-energy and RMSD for the protein set used in the experiments. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
22. Protective Antimicrobial Effect of the Potential Vaccine Created on the Basis of the Structure of the IgA1 Protease from Neisseria meningitidis.
- Author
-
Prokopenko, Yuri, Zinchenko, Alexei, Karlinsky, David, Kotelnikova, Olga, Razgulyaeva, Olga, Gordeeva, Elena, Nokel, Elena, Serova, Oxana, Kaliberda, Elena, Zhigis, Larisa, Rumsh, Lev, and Smirnov, Ivan
- Subjects
PROTEIN structure prediction ,RECOMBINANT proteins ,NEISSERIA meningitidis ,MENINGOCOCCAL vaccines ,ANTIBODY titer - Abstract
Background/Objectives: IgA1 protease is one of the virulence factors of Neisseria meningitidis, Haemophilus influenzae and other pathogens causing bacterial meningitis. The aim of this research is to create recombinant proteins based on fragments of the mature IgA1 protease A
28 –P1004 from N. meningitidis serogroup B strain H44/76. These proteins are potential components of an antimeningococcal vaccine for protection against infections caused by pathogenic strains of N. meningitidis and other bacteria producing serine-type IgA1 proteases. Methods: To obtain promising antigens for creating a vaccine, we designed and obtained several recombinant proteins. These proteins consisted of single or directly connected fragments selected from various regions of the IgA1 protease A28 –P1004 . The choice of these fragments was based on our calculated data on the distribution of linear and conformational B-cell epitopes and MHC-II T-cell epitopes in the structure of IgA1 protease, taking into account the physicochemical properties of potential compounds and the results of a comparative analysis of the spatial structures of the original IgA1 protease and potential recombinant proteins. We studied the immunogenic and protective effects of the obtained proteins on the BALB/c mice against meningococci of serogroups A, B and C. Results: Proteins MA28 –P1004 -LEH6 , MW140 –K833 -LEH6 , MW329 –P1004 -LEH6 , M(W140 –H328 )-(W412 –D604 )-(Y866 –P1004 )-LEH6 and M(W140 –Q299 )-(Y866 –P1004 )-LEH6 have shown the following antibody titers, 103 /titer: 11 ± 1, 6 ± 2, 6 ± 1, 9 ± 1 and 22 ± 3, respectively. Also, the last two proteins have shown the best average degree of protection from N. meningitidis serogroups A, B and C, %: 62 ± 6, 63 ± 5, 67 ± 4 respectively for M(W140 –H328 )-(W412 –D604 )-(Y866 –P1004 )-LEH6 and 70 ± 5, 66 ± 6, 83 ± 3 respectively for M(W140 –Q299 )-(Y866 –P1004 )-LEH6 . Conclusions: We selected two recombinant proteins consisting of two (M(W140 –Q299 )-(Y866 –P1004 )-LEH6 ) or three (M(W140 –H328 )-(W412 –D604 )-(Y866 –P1004 )-LEH6 ) linked fragments of IgA1 protease A28 –P1004 as candidate active component for an antimeningococcal vaccine. [ABSTRACT FROM AUTHOR]- Published
- 2024
- Full Text
- View/download PDF
23. Structure‐based clustering and mutagenesis of bacterial tannases reveals the importance and diversity of active site‐capping domains.
- Author
-
Coleman, Tom, Viknander, Sandra, Kirk, Alicia M., Sandberg, David, Caron, Elise, Zelezniak, Aleksej, Krenske, Elizabeth, and Larsbrink, Johan
- Abstract
Tannins are critical plant defense metabolites, enriched in bark and leaves, that protect against microorganisms and insects by binding to and precipitating proteins. Hydrolyzable tannins contain ester bonds which can be cleaved by tannases—serine hydrolases containing so‐called "cap" domains covering their active sites. However, comprehensive insights into the biochemical properties and structural diversity of tannases are limited, especially regarding their cap domains. We here present a code pipeline for structure prediction‐based hierarchical clustering to categorize the whole family of bacterial tannases, and have used it to discover new types of cap domains and other structural insertions among these enzymes. Subsequently, we used two recently identified tannases from the gut/soil bacterium Clostridium butyricum as model systems to explore the biochemical and structural properties of the cap domains of tannases. We demonstrate using molecular dynamics and mutagenesis that the cap domain covering the active site plays a major role in enzyme substrate preference, inhibition, and activity—despite not directly interacting with smaller substrates. The present work provides deeper knowledge into the mechanism, structural dynamics, and diversity of tannases. The structure‐based clustering approach presents a new way of classifying any other enzyme family, and will be of relevance for enzyme types where activity is influenced by variable loop or insert regions appended to a core protein fold. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
24. Predicting multiple conformations of ligand binding sites in proteins suggests that AlphaFold2 may remember too much.
- Author
-
Lazou, Maria, Khan, Omeir, Thu Nguyen, Padhorny, Dzmitry, Kozakov, Dima, Joseph-McCarthy, Diane, and Vajda, Sandor
- Subjects
- *
PROTEIN structure prediction , *LIGAND binding (Biochemistry) , *PROTEIN conformation , *BINDING sites , *PROTEIN structure - Abstract
The goal of this paper is predicting the conformational distributions of ligand binding sites using the AlphaFold2 (AF2) protein structure prediction program with stochastic subsampling of the multiple sequence alignment (MSA). We explored the opening of cryptic ligand binding sites in 16 proteins, where the closed and open conformations define the expected extreme points of the conformational variation. Due to the many structures of these proteins in the Protein Data Bank (PDB), we were able to study whether the distribution of X-ray structures affects the distribution of AF2 models. We have found that AF2 generates both a cluster of open and a cluster of closed models for proteins that have comparable numbers of open and closed structures in the PDB and not too many other conformations. This was observed even with default MSA parameters, thus without further subsampling. In contrast, with the exception of a single protein, AF2 did not yield multiple clusters of conformations for proteins that had imbalanced numbers of open and closed structures in the PDB, or had substantial numbers of other structures. Subsampling improved the results only for a single protein, but very shallow MSA led to incorrect structures. The ability of generating both open and closed conformations for six out of the 16 proteins agrees with the success rates of similar studies reported in the literature. However, we showed that this partial success is due to AF2 “remembering” the conformational distributions in the PDB and that the approach fails to predict rarely seen conformations. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
25. Structure Prediction and Computational Protein Design for Efficient Biocatalysts and Bioactive Proteins.
- Author
-
Buller, Rebecca, Damborsky, Jiri, Hilvert, Donald, and Bornscheuer, Uwe T.
- Abstract
The ability to predict and design protein structures has led to numerous applications in medicine, diagnostics and sustainable chemical manufacture. In addition, the wealth of predicted protein structures has advanced our understanding of how life's molecules function and interact. Honouring the work that has fundamentally changed the way scientists research and engineer proteins, the Nobel Prize in Chemistry in 2024 was awarded to David Baker for computational protein design and jointly to Demis Hassabis and John Jumper, who developed AlphaFold for machine‐learning‐based protein structure prediction. Here, we highlight notable contributions to the development of these computational tools and their importance for the design of functional proteins that are applied in organic synthesis. Notably, both technologies have the potential to impact drug discovery as any therapeutic protein target can now be modelled, allowing the de novo design of peptide binders and the identification of small molecule ligands through in silico docking of large compound libraries. Looking ahead, we highlight future research directions in protein engineering, medicinal chemistry and material design that are enabled by this transformative shift in protein science. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
26. Systematic analysis of the relationship between fold-dependent flexibility and artificial intelligence protein structure prediction.
- Author
-
Haque, Neshatul, Wagenknecht, Jessica B., Ratnasinghe, Brian D., and Zimmermann, Michael T.
- Subjects
- *
PROTEIN structure prediction , *SYNTHETIC proteins , *SCIENTIFIC knowledge , *PROTEIN conformation , *PROTEIN structure , *DEEP learning - Abstract
Artificial Intelligence (AI)-based deep learning methods for predicting protein structures are reshaping knowledge development and scientific discovery. Recent large-scale application of AI models for protein structure prediction has changed perceptions about complicated biological problems and empowered a new generation of structure-based hypothesis testing. It is well-recognized that proteins have a modular organization according to archetypal folds. However, it is yet to be determined if predicted structures are tuned to one conformation of flexible proteins or if they represent average conformations. Further, whether or not the answer is protein fold-dependent. Therefore, in this study, we analyzed 2878 proteins with at least ten distinct experimental structures available, from which we can estimate protein topological rigidity verses heterogeneity from experimental measurements. We found that AlphaFold v2 (AF2) predictions consistently return one specific form to high accuracy, with 99.68% of distinct folds (n = 623 out of 628) having an experimental structure within 2.5Å RMSD from a predicted structure. Yet, 27.70% and 10.82% of folds (174 and 68 out of 628 folds) have at least one experimental structure over 2.5Å and 5Å RMSD, respectively, from their AI-predicted structure. This information is important for how researchers apply and interpret the output of AF2 and similar tools. Additionally, it enabled us to score fold types according to how homogeneous versus heterogeneous their conformations are. Importantly, folds with high heterogeneity are enriched among proteins which regulate vital biological processes including immune cell differentiation, immune activation, and metabolism. This result demonstrates that a large amount of protein fold flexibility has already been experimentally measured, is vital for critical cellular processes, and is currently unaccounted for in structure prediction databases. Therefore, the structure-prediction revolution begets the protein dynamics revolution! [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
27. A proteome-wide structural systems approach reveals insights into protein families of all human herpesviruses.
- Author
-
Soh, Timothy K., Ognibene, Sofia, Sanders, Saskia, Schäper, Robin, Kaufer, Benedikt B., and Bosse, Jens B.
- Subjects
PROTEIN structure prediction ,VIRAL proteins ,TETRAHYDROFOLATE dehydrogenase ,NUCLEOSIDE transport proteins ,PROTEIN folding - Abstract
Structure predictions have become invaluable tools, but viral proteins are absent from the EMBL/DeepMind AlphaFold database. Here, we provide proteome-wide structure predictions for all nine human herpesviruses and analyze them in depth with explicit scoring thresholds. By clustering these predictions into structural similarity groups, we identified new families, such as the HCMV UL112-113 cluster, which is conserved in alpha- and betaherpesviruses. A domain-level search found protein families consisting of subgroups with varying numbers of duplicated folds. Using large-scale structural similarity searches, we identified viral proteins with cellular folds, such as the HSV-1 US2 cluster possessing dihydrofolate reductase folds and the EBV BMRF2 cluster that might have emerged from cellular equilibrative nucleoside transporters. Our HerpesFolds database is available at https://www.herpesfolds.org/herpesfolds and displays all models and clusters through an interactive web interface. Here, we show that system-wide structure predictions can reveal homology between viral species and identify potential protein functions. The nine human herpesviruses encode hundreds of genes, but the activity and function of many are unclear. Generating protein structure predictions of entire proteomes, the authors could infer the function of many so-far uncharacterized genes. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
28. Upstream open reading frames may contain hundreds of novel human exons.
- Author
-
Ji, Hyun Joo and Salzberg, Steven L.
- Subjects
- *
PROTEIN structure prediction , *AMINO acid sequence , *PROTEIN structure , *HUMAN genes , *MESSENGER RNA - Abstract
Several recent studies have presented evidence that the human gene catalogue should be expanded to include thousands of short open reading frames (ORFs) appearing upstream or downstream of existing protein-coding genes, each of which might create an additional bicistronic transcript in humans. Here we explore an alternative hypothesis that would explain the translational and evolutionary evidence for these upstream ORFs without the need to create novel genes or bicistronic transcripts. We examined 2,199 upstream ORFs that have been proposed as high-quality candidates for novel genes, to determine if they could instead represent protein-coding exons that can be added to existing genes. We checked for the conservation of these ORFs in four recently sequenced, high-quality human genomes, and found a large majority (87.8%) to be conserved in all four as expected. We then looked for splicing evidence that would connect each upstream ORF to the downstream protein-coding gene at the same locus, thus creating a novel splicing variant using the upstream ORF as its first exon. These protein coding exon candidates were further evaluated using protein structure predictions of the protein sequences that included the proposed new exons. We determined that 541 out of 2,199 upstream ORFs have strong evidence that they can form protein coding exons that are part of an existing gene, and that the resulting protein is predicted to have similar or better structural quality than the currently annotated isoform. Author summary: We analyzed over 2000 human sequences that have been proposed to represent novel protein-coding genes, and that reside just upstream of known genes. These "upstream ORFs" (uORFs) would represent a surprisingly large addition to the human gene catalogue, which after decades of refinement now contains just under 20,000 protein-coding genes. They would also create over 2000 new bicistronic genes, which number only 10 in current human annotation databases. We hypothesized that rather than novel genes, these sequences might instead represent novel exons that can be spliced into existing protein-coding genes, creating new isoforms of those genes. Using a combination of transcriptional evidence and computational predictions, we show that at least 541 of the previously-described uORFs can be used to create novel protein-coding exons, generating new transcripts and new protein isoforms, but not requiring the addition of entirely new genes to the human gene catalogue. We also demonstrate that the predicted three-dimensional structure of some of the new protein isoforms hints at new or improved functions for existing proteins. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
29. Iron transport pathways in the human malaria parasite Plasmodium falciparum revealed by RNA-sequencing.
- Author
-
Wunderlich, Juliane, Kotov, Vadim, Votborg-Novél, Lasse, Ntalla, Christina, Geffken, Maria, Peine, Sven, Portugal, Silvia, and Strauss, Jan
- Subjects
PROTEIN structure prediction ,IRON in the body ,ERYTHROCYTES ,TRANSFERRIN receptors ,HEME oxygenase ,TRANSFERRIN ,FERRITIN - Abstract
Host iron deficiency is protective against severe malaria as the human malaria parasite Plasmodium falciparum depends on bioavailable iron from its host to proliferate. The essential pathways of iron acquisition, storage, export, and detoxification in the parasite differ from those in humans, as orthologs of the mammalian transferrin receptor, ferritin, or ferroportin, and a functional heme oxygenase are absent in P. falciparum. Thus, the proteins involved in these processes may be excellent targets for therapeutic development, yet remain largely unknown. Here, we show that parasites cultured in erythrocytes from an iron-deficient donor displayed significantly reduced growth rates compared to those grown in red blood cells from healthy controls. Sequencing of parasite RNA revealed diminished expression of genes involved in overall metabolism, hemoglobin digestion, and metabolite transport under low-iron versus control conditions. Supplementation with hepcidin, a specific ferroportin inhibitor, resulted in increased labile iron levels in erythrocytes, enhanced parasite replication, and transcriptional upregulation of genes responsible for merozoite motility and host cell invasion. Through endogenous GFP tagging of differentially expressed putative transporter genes followed by confocal live-cell imaging, proliferation assays with knockout and knockdown lines, and protein structure predictions, we identified six proteins that are likely required for ferrous iron transport in P. falciparum. Of these, we localized Pf VIT and Pf ZIPCO to cytoplasmic vesicles, Pf MRS3 to the mitochondrion, and the novel putative iron transporter Pf E140 to the plasma membrane for the first time in P. falciparum. Pf NRAMP/ Pf DMT1 and Pf CRT were previously reported to efflux Fe
2+ from the digestive vacuole. Our data support a new model for parasite iron homeostasis, in which Pf E140 is involved in iron uptake across the plasma membrane, Pf MRS3 ensures non-redundant Fe2+ supply to the mitochondrion as the main site of iron utilization, Pf VIT transports excess iron into cytoplasmic vesicles, and Pf ZIPCO exports Fe2+ from these organelles in case of iron scarcity. These results provide new insights into the parasite's response to differential iron availability in its environment and into the mechanisms of iron transport in P. falciparum as promising candidate targets for future antimalarial drugs. [ABSTRACT FROM AUTHOR]- Published
- 2024
- Full Text
- View/download PDF
30. Predicting protein interactions of the kinase Lck critical to T cell modulation.
- Author
-
Gao, Mu and Skolnick, Jeffrey
- Subjects
- *
IMMUNE checkpoint proteins , *PROTEIN structure prediction , *CELLULAR control mechanisms , *PROTEIN-tyrosine phosphatase , *T cells - Abstract
Protein-protein interactions (PPIs) play pivotal roles in directing T cell fate. One key player is the non-receptor tyrosine protein kinase Lck that helps to transduce T cell activation signals. Lck is mediated by other proteins via interactions that are inadequately understood. Here, we use the deep learning method AF2Complex to predict PPIs involving Lck, by screening it against ∼1,000 proteins implicated in immune responses, followed by extensive structural modeling for selected interactions. Remarkably, we describe how Lck may be specifically targeted by a palmitoyltransferase using a phosphotyrosine motif. We uncover "hotspot" interactions between Lck and the tyrosine phosphatase CD45, leading to a significant conformational shift of Lck for activation. Lastly, we present intriguing interactions between the phosphotyrosine-binding domain of Lck and the cytoplasmic tail of the immune checkpoint LAG3 and propose a molecular mechanism for its inhibitory role. Together, this multifaceted study provides valuable insights into T cell regulation and signaling. [Display omitted] • Lck SH2 and SH3 domains are predicted to have multiple interaction partners • Palmitoyltransferase zDHHC18 targets Lck via a phosphotyrosine motif • CD45 interacts with Lck at hotspots to activate Lck • LAG3's cytoplasmic tail blocks access to the SH2 domain of Lck Gao and Skolnick use deep learning to identify protein-protein interaction partners of tyrosine kinase Lck among ∼1,000 immune-related proteins. Predicted structures for several complexes reveal potential molecular mechanisms, providing insights into their functions in T cell regulation and signaling. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
31. Assessing AF2's ability to predict structural ensembles of proteins.
- Author
-
Riccabona, Jakob R., Spoendlin, Fabian C., Fischer, Anna-Lena M., Loeffler, Johannes R., Quoika, Patrick K., Jenkins, Timothy P., Ferguson, James A., Smorodina, Eva, Laustsen, Andreas H., Greiff, Victor, Forli, Stefano, Ward, Andrew B., Deane, Charlotte M., and Fernández-Quintero, Monica L.
- Subjects
- *
PROTEIN structure prediction , *CYTOSKELETAL proteins , *MOLECULAR dynamics , *PROTEIN conformation , *FREE surfaces - Abstract
Recent breakthroughs in protein structure prediction have enhanced the precision and speed at which protein configurations can be determined. Additionally, molecular dynamics (MD) simulations serve as a crucial tool for capturing the conformational space of proteins, providing valuable insights into their structural fluctuations. However, the scope of MD simulations is often limited by the accessible timescales and the computational resources available, posing challenges to comprehensively exploring protein behaviors. Recently emerging approaches have focused on expanding the capability of AlphaFold2 (AF2) to predict conformational substates of protein. Here, we benchmark the performance of various workflows that have adapted AF2 for ensemble prediction and compare the obtained structures with ensembles obtained from MD simulations and NMR. We provide an overview of the levels of performance and accessible timescales that can currently be achieved with machine learning (ML) based ensemble generation. Significant minima of the free energy surfaces remain undetected. [Display omitted] • Ensemble prediction quality depends on training input to AlphaFold 2 (AF2) • MSA subsampling predicts ensembles but may miss key protein conformations • Current ensembles cannot reliably determine free energy, conformations, or properties • Ensemble data is crucial to improve conformational model accuracy Riccabona et al. underscore the importance of accurate structural data in predicting protein structural ensembles. They note that although rapid methods like MSA subsampling can generate ensembles, they often overlook functionally significant conformations, thereby missing crucial kinetic and thermodynamic insights. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
32. AlphaFold with conformational sampling reveals the structural landscape of homorepeats.
- Author
-
Bonet, David Fernandez, Ranyai, Shahrayar, Aswad, Luay, Lane, David P., Arsenian-Henriksson, Marie, Landreh, Michael, and Lama, Dilraj
- Subjects
- *
PROTEIN structure prediction , *MOLECULAR dynamics , *STRUCTURE-activity relationships , *AMINO acids , *MACHINE learning - Abstract
Homorepeats are motifs with reiterations of the same amino acid. They are prevalent in proteins associated with diverse physiological functions but also linked to several pathologies. Structural characterization of homorepeats has remained largely elusive, primarily because they generally occur in the disordered regions or proteins. Here, we address this subject by combining structures derived from machine learning with conformational sampling through physics-based simulations. We find that hydrophobic homorepeats have a tendency to fold into structured secondary conformations, while hydrophilic ones predominantly exist in unstructured states. Our data show that the flexibility rendered by disorder is a critical component besides the chemical feature that drives homorepeats composition toward hydrophilicity. The formation of regular secondary structures also influences their solubility, as pathologically relevant homorepeats display a direct correlation between repeat expansion, induction of helicity, and self-assembly. Our study provides critical insights into the conformational landscape of protein homorepeats and their structure-activity relationship. [Display omitted] • Insights into homorepeat structures by integrating AlphaFold with MD simulations • Homorepeats of different amino acids exhibit significant conformational diversity • Intrinsic disorder promotes the hydrophilic compositional bias of homorepeats • Homorepeat length expansion induces disorder-to-order transition and aggregation Bonet et al. have combined AlphaFold, a state-of-the-art protein structure prediction method, with molecular dynamics simulations to show that homorepeats can fold into diverse conformational states. Their study provides fundamental insights into the structural features of this under-characterized motifs, with potential implications for understanding their biological activity in proteins. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
33. Structure-aware annotation of leucine-rich repeat domains.
- Author
-
Xu, Boyan, Cerbu, Alois, Tralie, Christopher J., Lim, Daven, and Krasileva, Ksenia
- Subjects
- *
PROTEIN structure prediction , *PROTEIN structure , *AMINO acid sequence , *PROTEIN models , *PROTEIN domains , *DEEP learning - Abstract
Protein domain annotation is typically done by predictive models such as HMMs trained on sequence motifs. However, sequence-based annotation methods are prone to error, particularly in calling domain boundaries and motifs within them. These methods are limited by a lack of structural information accessible to the model. With the advent of deep learning-based protein structure prediction, existing sequenced-based domain annotation methods can be improved by taking into account the geometry of protein structures. We develop dimensionality reduction methods to annotate repeat units of the Leucine Rich Repeat solenoid domain. The methods are able to correct mistakes made by existing machine learning-based annotation tools and enable the automated detection of hairpin loops and structural anomalies in the solenoid. The methods are applied to 127 predicted structures of LRR-containing intracellular innate immune proteins in the model plant Arabidopsis thaliana and validated against a benchmark dataset of 172 manually-annotated LRR domains. Author summary: In immune receptors across various organisms, repeating protein structures play a crucial role in recognizing and responding to pathogen threats. These structures resemble the coils of a slinky toy, allowing these receptors to adapt and change over time. One particularly vital but challenging structure to study is the Leucine Rich Repeat (LRR). Traditional methods that rely just on analyzing the sequence of these proteins can miss subtle changes due to rapid evolution. With the introduction of protein structure prediction tools like AlphaFold 2, annotation methods can study the coarser geometric properties of the structure. In this study, we visualize LRR proteins in three dimensions and use a mathematical approach to 'flatten' them into two dimensions, so that the coils form circles. We then used a mathematical concept called winding number to determine the number of repeats and where they are in a protein sequence. This process helps reveal their repeating patterns with enhanced clarity. When we applied this method to immune receptors from a model plant organism, we found that our approach could accurately identify coiling patterns. Furthermore, we detected errors made by previous methods and highlighted unique structural variations. Our research offers a fresh perspective on understanding immune receptors, potentially influencing studies on their evolution and function. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
34. Protein language models learn evolutionary statistics of interacting sequence motifs.
- Author
-
Zhidian Zhang, Wayment-Steele, Hannah K., Brixi, Garyk, Haobo Wang, Kern, Dorothee, and Ovchinnikov, Sergey
- Subjects
- *
GAUSSIAN Markov random fields , *LANGUAGE models , *PROTEIN structure prediction , *PROTEIN structure , *PROTEIN models - Abstract
Protein language models (pLMs) have emerged as potent tools for predicting and designing protein structure and function, and the degree to which these models fundamentally understand the inherent biophysics of protein structure stands as an open question. Motivated by a finding that pLM-based structure predictors erroneously predict nonphysical structures for protein isoforms, we investigated the nature of sequence context needed for contact predictions in the pLM Evolutionary Scale Modeling (ESM-2). We demonstrate by use of a "categorical Jacobian" calculation that ESM-2 stores statistics of coevolving residues, analogously to simpler modeling approaches like Markov Random Fields and Multivariate Gaussian models. We further investigated how ESM-2 "stores" information needed to predict contacts by comparing sequence masking strategies, and found that providing local windows of sequence information allowed ESM-2 to best recover predicted contacts. This suggests that pLMs predict contacts by storing motifs of pairwise contacts. Our investigation highlights the limitations of current pLMs and underscores the importance of understanding the underlying mechanisms of these models. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
35. Interplay between Two Paralogous Human Silencing Hub (HuSH) Complexes in Regulating LINE-1 Element Silencing.
- Author
-
Jensvold, Zena D., Flood, Julia R., Christenson, Anna E., and Lewis, Peter W.
- Subjects
PROTEIN structure prediction ,TRANSGENES ,RETROTRANSPOSONS ,INTERFERONS ,AMINO acids - Abstract
The Human Silencing Hub (HuSH) complex silences retrotransposable elements in vertebrates. Here, we identify a second HuSH complex, designated HuSH2, which is centered around TASOR2, a paralog of the core TASOR protein in HuSH. Our findings reveal that HuSH and HuSH2 localize to distinct and non-overlapping genomic loci. Specifically, HuSH localizes to and represses LINE-1 retrotransposons, whereas HuSH2 targets and represses KRAB-ZNFs and interferon signaling and response genes. We use in silico protein structure predictions to simulate MPP8 interactions with TASOR paralogs, guiding amino acid substitutions that disrupted binding to HuSH complexes. These MPP8 transgenes and other constructs reveal the importance of HuSH complex quantities in regulating LINE-1 activity. Furthermore, our results suggest that dynamic changes in TASOR and TASOR2 expression enable cells to finely tune HuSH-mediated silencing. This study offers insights into the interplay of HuSH complexes, highlighting their vital role in retrotransposon regulation. The study identifies HuSH2, a paralogous complex to HuSH, centered around TASOR2, and distinct in its genomic localization and function. HuSH represses LINE-1 retrotransposons, while HuSH2 regulates interferon signaling genes. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
36. Structural and functional insights from the sequences and complex domain architecture of adhesin-like proteins from Methanobrevibacter smithii and Methanosphaera stadtmanae.
- Author
-
Gupta, Anjali Bansal and Seedorf, Henning
- Subjects
PROTEIN structure prediction ,PROTEIN domains ,METHANOGENS ,MOLECULAR interactions ,CARBON dioxide - Abstract
Methanogenic archaea, or methanogens, are crucial in guts and rumens, consuming hydrogen, carbon dioxide, and other fermentation products. While their molecular interactions with other microorganisms are not fully understood, genomic sequences provide information. The first genome sequences of human gut methanogens, Methanosphaera stadtmanae and Methanobrevibacter smithii, revealed genes encoding adhesin-like proteins (ALPs). These proteins were also found in other gut and rumen methanogens, but their characteristics and functions remain largely unknown. This study analyzes the ALP repertoire of M. stadtmanae and M. smithii using AI-guided protein structure predictions of unique ALP domains. Both genomes encode more than 40 ALPs each, comprising over 10% of their genomes. ALPs contain repetitive sequences, many of which are unmatched in protein domain databases. We present unique sequence signatures of conserved ABD repeats in ALPs and propose a classification based on domain architecture. Our study offers insights into ALP features and how methanogens may interact with other microorganisms. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
37. Revolutionizing biomolecular structure determination with artificial intelligence.
- Author
-
Li, Han, Lei, Yipin, and Zeng, Jianyang
- Subjects
- *
MOLECULAR biology , *TRANSCRIPTION factors , *DRUG discovery , *SMALL molecules , *PROTEIN structure prediction - Abstract
The article "Revolutionizing biomolecular structure determination with artificial intelligence" discusses the importance of determining biomolecular structures for understanding biological functions and designing therapeutics. It highlights the emergence of advanced deep-learning-based methods like RoseTTAFold All-Atom (RFAA) and AlphaFold3 for generalized biomolecular structure modeling. These approaches aim to predict complex structures involving proteins, small molecules, nucleic acids, ions, and modified residues, enhancing the efficiency and accuracy of biomolecular structure determination. The article also emphasizes the critical role of training data in the success of deep-learning-based methods and the potential of RFAA and AlphaFold3 in advancing drug discovery. [Extracted from the article]
- Published
- 2024
- Full Text
- View/download PDF
38. The success rate of processed predicted models in molecular replacement: implications for experimental phasing in the AlphaFold era.
- Author
-
Keegan, Ronan M., Simpkin, Adam J., and Rigden, Daniel J.
- Subjects
- *
PROTEIN structure prediction , *CRYSTAL structure , *OPEN-ended questions , *FORECASTING - Abstract
The availability of highly accurate protein structure predictions from AlphaFold2 (AF2) and similar tools has hugely expanded the applicability of molecular replacement (MR) for crystal structure solution. Many structures can be solved routinely using raw models, structures processed to remove unreliable parts or models split into distinct structural units. There is therefore an open question around how many and which cases still require experimental phasing methods such as single‐wavelength anomalous diffraction (SAD). Here, this question is addressed using a large set of PDB depositions that were solved by SAD. A large majority (87%) could be solved using unedited or minimally edited AF2 predictions. A further 18 (4%) yield straightforwardly to MR after splitting of the AF2 prediction using Slice'N'Dice, although different splitting methods succeeded on slightly different sets of cases. It is also found that further unique targets can be solved by alternative modelling approaches such as ESMFold (four cases), alternative MR approaches such as ARCIMBOLDO and AMPLE (two cases each), and multimeric model building with AlphaFold‐Multimer or UniFold (three cases). Ultimately, only 12 cases, or 3% of the SAD‐phased set, did not yield to any form of MR tested here, offering valuable hints as to the number and the characteristics of cases where experimental phasing remains essential for macromolecular structure solution. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
39. Perspectives Toward an Integrative Structural Biology Pipeline With Atomic Force Microscopy Topographic Images.
- Author
-
Pellequer, Jean‐Luc
- Subjects
- *
ATOMIC force microscopy , *PROTEIN structure prediction , *NUCLEAR magnetic resonance , *SINGLE molecules , *ELECTRON microscopy - Abstract
After the recent double revolutions in structural biology, which include the use of direct detectors for cryo‐electron microscopy resulting in a significant improvement in the expected resolution of large macromolecule structures, and the advent of AlphaFold which allows for near‐accurate prediction of any protein structures, the field of structural biology is now pursuing more ambitious targets, including several MDa assemblies. But complex target systems cannot be tackled using a single biophysical technique. The field of integrative structural biology has emerged as a global solution. The aim is to integrate data from multiple complementary techniques to produce a final three‐dimensional model that cannot be obtained from any single technique. The absence of atomic force microscopy data from integrative structural biology platforms is not necessarily due to its nm resolution, as opposed to Å resolution for x‐ray crystallography, nuclear magnetic resonance, or electron microscopy. Rather a significant issue was that the AFM topographic data lacked interpretability. Fortunately, with the introduction of the AFM‐Assembly pipeline and other similar tools, it is now possible to integrate AFM topographic data into integrative modeling platforms. The advantages of single molecule techniques, such as AFM, include the ability to confirm experimentally any assembled molecular models or to produce alternative conformations that mimic the inherent flexibility of large proteins or complexes. The review begins with a brief overview of the historical developments of AFM data in structural biology, followed by an examination of the strengths and limitations of AFM imaging, which have hindered its integration into modern modeling platforms. This review discusses the correction and improvement of AFM topographic images, as well as the principles behind the AFM‐Assembly pipeline. It also presents and discusses a series of challenges that need to be addressed in order to improve the incorporation of AFM data into integrative modeling platform. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
40. Assessing the Utility of ColabFold and AlphaMissense in Determining Missense Variant Pathogenicity for Congenital Myasthenic Syndromes.
- Author
-
Ryan-Phillips, Finlay, Henehan, Leighann, Ramdas, Sithara, Palace, Jacqueline, Beeson, David, and Dong, Yin Yao
- Subjects
CONGENITAL myasthenic syndromes ,NICOTINIC acetylcholine receptors ,PROTEIN structure prediction ,MISSENSE mutation ,NUCLEOTIDE sequencing - Abstract
Background/Objectives: Congenital myasthenic syndromes (CMSs) are caused by variants in >30 genes with increasing numbers of variants of unknown significance (VUS) discovered by next-generation sequencing. Establishing VUS pathogenicity requires in vitro studies that slow diagnosis and treatment initiation. The recently developed protein structure prediction software AlphaFold2/ColabFold has revolutionized structural biology; such predictions have also been leveraged in AlphaMissense, which predicts ClinVar variant pathogenicity with 90% accuracy. Few reports, however, have tested these tools on rigorously characterized clinical data. We therefore assessed ColabFold and AlphaMissense as diagnostic aids for CMSs, using variants of the CHRN genes that encode the nicotinic acetylcholine receptor (nAChR). Methods: Utilizing a dataset of 61 clinically validated CHRN variants, (1) we evaluated the possibility of a ColabFold metric (either predicted structural disruption, prediction confidence, or prediction quality) that distinguishes variant pathogenicity; (2) we assessed AlphaMissense's ability to differentiate variant pathogenicity; and (3) we compared AlphaMissense to the existing pathogenicity prediction programs AlamutVP and EVE. Results: Analyzing the variant effects on ColabFold CHRN structure prediction, prediction confidence, and prediction quality did not yield any reliable pathogenicity indicative metric. However, AlphaMissense predicted variant pathogenicity with 63.93% accuracy in our dataset—a much greater proportion than AlamutVP (27.87%) and EVE (28.33%). Conclusions: Emerging in silico tools can revolutionize genetic disease diagnosis—however, improvement, refinement, and clinical validation are imperative prior to practical acquisition. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
41. Aberrant Splicing in PKD2 in a Family of Korean Patients With Autosomal Dominant Polycystic Kidney Disease.
- Author
-
Soo-Young Yoon, Jin Sug Kim, and Kyung Sun Park
- Subjects
POLYCYSTIC kidney disease ,MEDICAL genetics ,CYSTIC kidney disease ,GENETIC variation ,PROTEIN structure prediction - Abstract
This article discusses a case study of a Korean family with autosomal dominant polycystic kidney disease (ADPKD) caused by a genetic mutation in the PKD2 gene. The study found a variant in the gene that could potentially lead to premature termination of protein production. The authors highlight the importance of genetic testing and sequencing in diagnosing ADPKD and emphasize the need for further research on the efficiency of nonsense-mediated decay in the PKD2 gene. They also discuss the structural changes in the mutant protein and classify the variant as likely pathogenic based on splicing prediction. The article underscores the significance of splicing prediction in interpreting intronic variants. [Extracted from the article]
- Published
- 2024
- Full Text
- View/download PDF
42. DeepMind: From Games to Scientific Discovery.
- Author
-
Hassabis, Demis
- Subjects
ARTIFICIAL intelligence ,STRATEGY games ,DEEP reinforcement learning ,REINFORCEMENT learning ,PROTEIN structure prediction ,SYNTHETIC biology ,EXPERT systems ,DOPAMINE - Abstract
Demis Hassabis, the visionary cofounder and CEO of DeepMind, has made significant contributions to artificial intelligence (AI) and scientific discovery. His work on AlphaFold, an AI system that predicts protein structures, led to him winning the 2024 Nobel Prize in Chemistry. Hassabis' journey from designing computer games to pioneering AI showcases the transformative power of visionary leadership in innovation development. DeepMind's integration of AI into scientific research and interdisciplinary collaboration highlights the potential for AI to drive transformative advancements in various fields. [Extracted from the article]
- Published
- 2024
- Full Text
- View/download PDF
43. Unveiling the biosynthesis mechanism of novel lantibiotic homicorcin: an in silico analysis
- Author
-
Md. Amzad Hossain, Md. Rakibul Islam, Omar Faruk, Takeshi Zendo, M. Aftab Uddin, Haseena Khan, and Mohammad Riazul Islam
- Subjects
Lantibiotic ,Homicorcin ,Dehydratase ,Cyclase ,Protein structure prediction ,Protein-protein docking ,Medicine ,Science - Abstract
Abstract Jute endophyte Staphylococcus hominis strain MBL_AB63 was reported to produce a novel antimicrobial peptide, ‘homicorcin’. This exhibits potential activity against a broad spectrum of Gram-positive bacteria. Eight genes were predicted to be involved in the sequential maturation of this peptide antibiotic, which includes structural (homA), dehydratase (homB), cyclase (homC), peptidase (homP), immunity (homI), oxidoreductase (homO), ATP-binding cassette transporter (homT1), and permease (homT2), respectively. Among the modification enzymes, HomB, HomC, and HomP exhibit sequence similarities with class I lantibiotic dehydratase, cyclase, and leader peptidase, respectively. The current study investigated the sequential modifications and secretion of homicorcin by constructing robust computational protein models and analyzing their interaction patterns using protein-protein docking techniques. To enhance comprehension of the protein arrangement, their subcellular localization was also extrapolated. The findings demonstrate a network of proteins that works in a synchronized manner, where HomC functions as an intermediary between HomB and the transporter (HomT). Following its dehydration by HomB and cyclization by HomC, the pro-homicorcin is taken out of the cell by the transporter and processed by HomP, resulting in the production of matured, processed homicorcin. This biosynthesis model for homicorcin will lay the groundwork for the sustainable and efficient production of this peptide antibiotic.
- Published
- 2024
- Full Text
- View/download PDF
44. Structure prediction and refinement of protein sequence to identify intrinsically disordered regions of islet amyloid polypeptide (IAPP) using in-silico approach.
- Author
-
Reddy, Dorankula Viswateja and John, Arun
- Subjects
- *
AMYLIN , *PROTEIN structure prediction , *TYPE 2 diabetes , *AMINO acid sequence , *PROTEIN structure - Abstract
IAPP, or Islet Amyloid Polypeptide, is a protein that plays a crucial role in type II diabetes. The purpose of this work is to investigate IAPP from a structural and sequential perspective. Various Research Methods and Instruments including: In this instance, we acquired a sequence from UniProt and then used Alphafold to create a prediction about the structure of the sequence. Therefore, the execution of the FG-MD algorithm is not possible because there are no disorder regions present. Based on the findings, it can be concluded that the PROTPARAM server was utilised for the purpose of sequential analysis. Our submission of the genomic sequence of the IAPP protein was made through the usage of this site. We utilised the Phyre2 server in order to carry out the structural analysis. This prediction offers the pdb file for the protein that is anticipated to be the target. An examination of the structure was carried out with the help of a Ramachandran plot validation, which made use of saves. Ramachandran plot analysis was the method that we utilised in order to complete our investigation into the structure of the IAPP protein. Because of this, a structure that is both compact and extremely precise is required in order to validate the target protein (IAPP). [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
45. Critical assessment of methods of protein structure prediction (CASP)—Round XV
- Author
-
Kryshtafovych, Andriy, Schwede, Torsten, Topf, Maya, Fidelis, Krzysztof, and Moult, John
- Subjects
Biochemistry and Cell Biology ,Bioinformatics and Computational Biology ,Biological Sciences ,Networking and Information Technology R&D (NITRD) ,Machine Learning and Artificial Intelligence ,Protein Conformation ,Models ,Molecular ,Proteins ,Amino Acid Sequence ,Computational Biology ,CASP ,community wide experiment ,protein structure prediction ,Mathematical Sciences ,Information and Computing Sciences ,Bioinformatics ,Biological sciences ,Mathematical sciences - Abstract
Computing protein structure from amino acid sequence information has been a long-standing grand challenge. Critical assessment of structure prediction (CASP) conducts community experiments aimed at advancing solutions to this and related problems. Experiments are conducted every 2 years. The 2020 experiment (CASP14) saw major progress, with the second generation of deep learning methods delivering accuracy comparable with experiment for many single proteins. There is an expectation that these methods will have much wider application in computational structural biology. Here we summarize results from the most recent experiment, CASP15, in 2022, with an emphasis on new deep learning-driven progress. Other papers in this special issue of proteins provide more detailed analysis. For single protein structures, the AlphaFold2 deep learning method is still superior to other approaches, but there are two points of note. First, although AlphaFold2 was the core of all the most successful methods, there was a wide variety of implementation and combination with other methods. Second, using the standard AlphaFold2 protocol and default parameters only produces the highest quality result for about two thirds of the targets, and more extensive sampling is required for the others. The major advance in this CASP is the enormous increase in the accuracy of computed protein complexes, achieved by the use of deep learning methods, although overall these do not fully match the performance for single proteins. Here too, AlphaFold2 based method perform best, and again more extensive sampling than the defaults is often required. Also of note are the encouraging early results on the use of deep learning to compute ensembles of macromolecular structures. Critically for the usability of computed structures, for both single proteins and protein complexes, deep learning derived estimates of both local and global accuracy are of high quality, however the estimates in interface regions are slightly less reliable. CASP15 also included computation of RNA structures for the first time. Here, the classical approaches produced better agreement with experiment than the new deep learning ones, and accuracy is limited. Also, for the first time, CASP included the computation of protein-ligand complexes, an area of special interest for drug design. Here too, classical methods were still superior to deep learning ones. Many new approaches were discussed at the CASP conference, and it is clear methods will continue to advance.
- Published
- 2023
46. Protein target highlights in CASP15: Analysis of models by structure providers
- Author
-
Alexander, Leila T, Durairaj, Janani, Kryshtafovych, Andriy, Abriata, Luciano A, Bayo, Yusupha, Bhabha, Gira, Breyton, Cécile, Caulton, Simon G, Chen, James, Degroux, Séraphine, Ekiert, Damian C, Erlandsen, Benedikte S, Freddolino, Peter L, Gilzer, Dominic, Greening, Chris, Grimes, Jonathan M, Grinter, Rhys, Gurusaran, Manickam, Hartmann, Marcus D, Hitchman, Charlie J, Keown, Jeremy R, Kropp, Ashleigh, Kursula, Petri, Lovering, Andrew L, Lemaitre, Bruno, Lia, Andrea, Liu, Shiheng, Logotheti, Maria, Lu, Shuze, Markússon, Sigurbjörn, Miller, Mitchell D, Minasov, George, Niemann, Hartmut H, Opazo, Felipe, Phillips, George N, Davies, Owen R, Rommelaere, Samuel, Rosas‐Lemus, Monica, Roversi, Pietro, Satchell, Karla, Smith, Nathan, Wilson, Mark A, Wu, Kuan‐Lin, Xia, Xian, Xiao, Han, Zhang, Wenhua, Zhou, Z Hong, Fidelis, Krzysztof, Topf, Maya, Moult, John, and Schwede, Torsten
- Subjects
Biochemistry and Cell Biology ,Bioinformatics and Computational Biology ,Biological Sciences ,Generic health relevance ,Protein Conformation ,Models ,Molecular ,Computational Biology ,Proteins ,CASP ,cryo-EM ,protein structure prediction ,X-ray crystallography ,Mathematical Sciences ,Information and Computing Sciences ,Bioinformatics ,Biological sciences ,Mathematical sciences - Abstract
We present an in-depth analysis of selected CASP15 targets, focusing on their biological and functional significance. The authors of the structures identify and discuss key protein features and evaluate how effectively these aspects were captured in the submitted predictions. While the overall ability to predict three-dimensional protein structures continues to impress, reproducing uncommon features not previously observed in experimental structures is still a challenge. Furthermore, instances with conformational flexibility and large multimeric complexes highlight the need for novel scoring strategies to better emphasize biologically relevant structural regions. Looking ahead, closer integration of computational and experimental techniques will play a key role in determining the next challenges to be unraveled in the field of structural molecular biology.
- Published
- 2023
47. Tertiary structure assessment at CASP15
- Author
-
Simpkin, Adam J, Mesdaghi, Shahram, Rodríguez, Filomeno Sánchez, Elliott, Luc, Murphy, David L, Kryshtafovych, Andriy, Keegan, Ronan M, and Rigden, Daniel J
- Subjects
Biochemistry and Cell Biology ,Biological Sciences ,Furylfuramide ,Computational Biology ,Models ,Molecular ,Proteins ,Sequence Alignment ,CASP15 ,machine learning ,molecular replacement ,protein modelling ,protein structure prediction ,structural bioinformatics ,Mathematical Sciences ,Information and Computing Sciences ,Bioinformatics ,Biological sciences ,Mathematical sciences - Abstract
The results of tertiary structure assessment at CASP15 are reported. For the first time, recognizing the outstanding performance of AlphaFold 2 (AF2) at CASP14, all single-chain predictions were assessed together, irrespective of whether a template was available. At CASP15, there was no single stand-out group, with most of the best-scoring groups-led by PEZYFoldings, UM-TBM, and Yang Server-employing AF2 in one way or another. Many top groups paid special attention to generating deep Multiple Sequence Alignments (MSAs) and testing variant MSAs, thereby allowing them to successfully address some of the hardest targets. Such difficult targets, as well as lacking templates, were typically proteins with few homologues. Local divergence between prediction and target correlated with localization at crystal lattice or chain interfaces, and with regions exhibiting high B-factor factors in crystal structure targets, and should not necessarily be considered as representing error in the prediction. However, analysis of exposed and buried side chain accuracy showed room for improvement even in the latter. Nevertheless, a majority of groups produced high-quality predictions for most targets, which are valuable for experimental structure determination, functional analysis, and many other tasks across biology. These include those applying methods similar to those used to generate major resources such as the AlphaFold Protein Structure Database and the ESM Metagenomic atlas: the confidence estimates of the former were also notably accurate.
- Published
- 2023
48. To split or not to split: CASP15 targets and their processing into tertiary structure evaluation units
- Author
-
Kryshtafovych, Andriy and Rigden, Daniel J
- Subjects
Biochemistry and Cell Biology ,Biological Sciences ,Protein Folding ,Models ,Molecular ,Computational Biology ,Databases ,Protein ,Proteins ,CASP15 ,evaluation units ,protein domains ,protein structure ,protein structure prediction ,Mathematical Sciences ,Information and Computing Sciences ,Bioinformatics ,Biological sciences ,Mathematical sciences - Abstract
Processing of CASP15 targets into evaluation units (EUs) and assigning them to evolutionary-based prediction classes is presented in this study. The targets were first split into structural domains based on compactness and similarity to other proteins. Models were then evaluated against these domains and their combinations. The domains were joined into larger EUs if predictors' performance on the combined units was similar to that on individual domains. Alternatively, if most predictors performed better on the individual domains, then they were retained as EUs. As a result, 112 evaluation units were created from 77 tertiary structure prediction targets. The EUs were assigned to four prediction classes roughly corresponding to target difficulty categories in previous CASPs: TBM (template-based modeling, easy or hard), FM (free modeling), and the TBM/FM overlap category. More than a third of CASP15 EUs were attributed to the historically most challenging FM class, where homology or structural analogy to proteins of known fold cannot be detected.
- Published
- 2023
49. Structural insights of the p97/VCP AAA+ ATPase: How adapter interactions coordinate diverse cellular functionality.
- Author
-
Braxton, Julian and Southworth, Daniel
- Subjects
AAA+ ATPase ,AlphaFold ,ERAD ,VCP ,adapter ,adaptor ,autophagy ,cofactor ,cryo-EM ,molecular chaperone ,p97 ,protein structure prediction ,proteostasis ,unfoldase ,Humans ,Adaptor Proteins ,Signal Transducing ,Valosin Containing Protein ,Protein Folding ,Protein Domains ,Models ,Molecular ,Protein Structure ,Quaternary - Abstract
p97/valosin-containing protein is an essential eukaryotic AAA+ ATPase with diverse functions including protein homeostasis, membrane remodeling, and chromatin regulation. Dysregulation of p97 function causes severe neurodegenerative disease and is associated with cancer, making this protein a significant therapeutic target. p97 extracts polypeptide substrates from macromolecular assemblies by hydrolysis-driven translocation through its central pore. Growing evidence indicates that this activity is highly coordinated by adapter partner proteins, of which more than 30 have been identified and are commonly described to facilitate translocation through substrate recruitment or modification. In so doing, these adapters enable critical p97-dependent functions such as extraction of misfolded proteins from the endoplasmic reticulum or mitochondria, and are likely the reason for the extreme functional diversity of p97 relative to other AAA+ translocases. Here, we review the known functions of adapter proteins and highlight recent structural and biochemical advances that have begun to reveal the diverse molecular bases for adapter-mediated regulation of p97 function. These studies suggest that the range of mechanisms by which p97 activity is controlled is vastly underexplored with significant advances possible for understanding p97 regulation by the most known adapters.
- Published
- 2023
50. AI versus the brain.
- Author
-
Stone, James V.
- Subjects
- *
ARTIFICIAL intelligence , *LANGUAGE models , *ARTIFICIAL neural networks , *BELL'S theorem , *PROTEIN structure prediction , *NEUROLINGUISTICS - Abstract
The article "AI versus the brain" discusses the differences between artificial intelligence (AI) systems and the human brain. AI researchers focus on building systems that can solve specific tasks, while the human brain has complex dynamics and memory capabilities that AI lacks. Despite AI advancements in areas like game-playing and protein structure prediction, AI systems like ChatGPT are limited in their problem-solving abilities compared to human brains. The article highlights the unique strengths and limitations of AI systems in comparison to the human brain. [Extracted from the article]
- Published
- 2024
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.