Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Total Articles

101
(from Reference Citation Analysis)

Article PDFs (46)

Cited by > 0 (65)

Searched Name

protein structure predictions

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Statistics

Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Category

Show more Refine

Number	Citation Analysis
1	The S-component fold: a link between bacterial transporters and receptors. Commun Biol 2024;7:610. [PMID: 38773269 PMCID: PMC11109136 DOI: 10.1038/s42003-024-06295-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Accepted: 05/06/2024] [Indexed: 05/23/2024] Open Abstract The processes of nutrient uptake and signal sensing are crucial for microbial survival and adaptation. Membrane-embedded proteins involved in these functions (transporters and receptors) are commonly regarded as unrelated in terms of sequence, structure, mechanism of action and evolutionary history. Here, we analyze the protein structural universe using recently developed artificial intelligence-based structure prediction tools, and find an unexpected link between prominent groups of microbial transporters and receptors. The so-called S-components of Energy-Coupling Factor (ECF) transporters, and the membrane domains of sensor histidine kinases of the 5TMR cluster share a structural fold. The discovery of their relatedness manifests a widespread case of prokaryotic "transceptors" (related proteins with transport or receptor function), showcases how artificial intelligence-based structure predictions reveal unchartered evolutionary connections between proteins, and provides new avenues for engineering transport and signaling functions in bacteria. Collapse Key Words membrane proteins protein structure predictions Collapse MESH Headings Bacterial Proteins/metabolism Bacterial Proteins/chemistry Bacterial Proteins/genetics Membrane Transport Proteins/metabolism Membrane Transport Proteins/chemistry Membrane Transport Proteins/genetics Histidine Kinase/metabolism Histidine Kinase/chemistry Histidine Kinase/genetics Models, Molecular Bacteria/metabolism Bacteria/genetics Signal Transduction Protein Folding Artificial Intelligence Collapse Grants Collapse
2	Structure-based epitope prediction and assessment of cross-reactivity of Myrmecia pilosula venom-specific IgE and recombinant Sol g proteins (Solenopsis geminata). Sci Rep 2024;14:11145. [PMID: 38750087 PMCID: PMC11096326 DOI: 10.1038/s41598-024-61843-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Accepted: 05/10/2024] [Indexed: 05/18/2024] Open Abstract The global distribution of tropical fire ants (Solenopsis geminata) raises concerns about anaphylaxis and serious medical issues in numerous countries. This investigation focused on the cross-reactivity of allergen-specific IgE antibodies between S. geminata and Myrmecia pilosula (Jack Jumper ant) venom proteins due to the potential emergence of cross-reactive allergies in the future. Antibody epitope analysis unveiled one predominant conformational epitope on Sol g 1.1 (PI score of 0.989), followed by Sol g 2.2, Sol g 4.1, and Sol g 3.1. Additionally, Pilosulin 1 showed high allergenic potential (PI score of 0.94), with Pilosulin 5a (PI score of 0.797) leading in B-cell epitopes. The sequence analysis indicated that Sol g 2.2 and Sol g 4.1 pose a high risk of cross-reactivity with Pilosulins 4.1a and 5a. Furthermore, the cross-reactivity of recombinant Sol g proteins with M. pilosula-specific IgE antibodies from 41 patients revealed high cross-reactivity for r-Sol g 3.1 (58.53%) and r-Sol g 4.1 (43.90%), followed by r-Sol g 2.2 (26.82%), and r-Sol g 1.1 (9.75%). Therefore, this study demonstrates cross-reactivity (85.36%) between S. geminata and M. pilosula, highlighting the allergenic risk. Understanding these reactions is vital for the prevention of severe allergic reactions, especially in individuals with pre-existing Jumper Jack ant allergy, informing future management strategies. Collapse Key Words computational models protein function predictions protein structure predictions biochemistry risk factors Collapse MESH Headings Immunoglobulin E/immunology Cross Reactions/immunology Animals Humans Ant Venoms/immunology Ants/immunology Allergens/immunology Epitopes/immunology Recombinant Proteins/immunology Insect Proteins/immunology Female Adult Male Amino Acid Sequence Middle Aged Adolescent Young Adult Collapse Grants B01F650006 The Program Management Unit for Human Resources and Institutional Development, Research and Innovation (PMU-B) for postdoctoral scholarship The Royal Golden Jubilee Ph.D. Program, Thailand ARC Future Fellowship The Fundamental Fund of Khon Kean University (KKU) The National Science, Research and Innovation Fund (NSRF), Thailand. Collapse
3	Cryo2StructData: A Large Labeled Cryo-EM Density Map Dataset for AI-based Modeling of Protein Structures. Sci Data 2024;11:458. [PMID: 38710720 PMCID: PMC11074267 DOI: 10.1038/s41597-024-03299-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Accepted: 04/23/2024] [Indexed: 05/08/2024] Open Abstract The advent of single-particle cryo-electron microscopy (cryo-EM) has brought forth a new era of structural biology, enabling the routine determination of large biological molecules and their complexes at atomic resolution. The high-resolution structures of biological macromolecules and their complexes significantly expedite biomedical research and drug discovery. However, automatically and accurately building atomic models from high-resolution cryo-EM density maps is still time-consuming and challenging when template-based models are unavailable. Artificial intelligence (AI) methods such as deep learning trained on limited amount of labeled cryo-EM density maps generate inaccurate atomic models. To address this issue, we created a dataset called Cryo2StructData consisting of 7,600 preprocessed cryo-EM density maps whose voxels are labelled according to their corresponding known atomic structures for training and testing AI methods to build atomic models from cryo-EM density maps. Cryo2StructData is larger than existing, publicly available datasets for training AI methods to build atomic protein structures from cryo-EM density maps. We trained and tested deep learning models on Cryo2StructData to validate its quality showing that it is ready for being used to train and test AI methods for building atomic models. Collapse Key Words data processing cryoelectron microscopy protein structure predictions computer science Collapse MESH Headings Cryoelectron Microscopy/methods Artificial Intelligence Proteins/chemistry Proteins/ultrastructure Models, Molecular Protein Conformation Collapse Grants R01 GM093123 NIGMS NIH HHS R01 GM146340 NIGMS NIH HHS U.S. Department of Health & Human Services \| National Institutes of Health (NIH) - R01GM146340 U.S. Department of Health & Human Services \| National Institutes of Health (NIH) - R01GM093123 U.S. Department of Energy (DOE) - DE–SC0020400 U.S. Department of Energy (DOE) - DE–SC0021303 National Science Foundation (NSF) - DBI2308699 U.S. Department of Health & Human Services \| NIH \| Center for Information Technology (Center for Information Technology, National Institutes of Health) - R01GM146340 Collapse
4	Immunoinformatics design of a structural proteins driven multi-epitope candidate vaccine against different SARS-CoV-2 variants based on fynomer. Sci Rep 2024;14:10297. [PMID: 38704475 PMCID: PMC11069592 DOI: 10.1038/s41598-024-61025-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Accepted: 04/30/2024] [Indexed: 05/06/2024] Open Abstract The ideal vaccines for combating diseases that may emerge in the future require more than simply inactivating a few pathogenic strains. This study aims to provide a peptide-based multi-epitope vaccine effective against various severe acute respiratory syndrome coronavirus 2 strains. To design the vaccine, a library of peptides from the spike, nucleocapsid, membrane, and envelope structural proteins of various strains was prepared. Then, the final vaccine structure was optimized using the fully protected epitopes and the fynomer scaffold. Using bioinformatics tools, the antigenicity, allergenicity, toxicity, physicochemical properties, population coverage, and secondary and three-dimensional structures of the vaccine candidate were evaluated. The bioinformatic analyses confirmed the high quality of the vaccine. According to further investigations, this structure is similar to native protein and there is a stable and strong interaction between vaccine and receptors. Based on molecular dynamics simulation, structural compactness and stability in binding were also observed. In addition, the immune simulation showed that the vaccine can stimulate immune responses similar to real conditions. Finally, codon optimization and in silico cloning confirmed efficient expression in Escherichia coli. In conclusion, the fynomer-based vaccine can be considered as a new style in designing and updating vaccines to protect against coronavirus disease. Collapse Key Words computational biology and bioinformatics predictive medicine protein analysis protein design protein structure predictions Collapse MESH Headings SARS-CoV-2/immunology SARS-CoV-2/genetics COVID-19 Vaccines/immunology Humans Computational Biology/methods COVID-19/prevention & control COVID-19/immunology COVID-19/virology Molecular Dynamics Simulation Epitopes/immunology Epitopes/chemistry Spike Glycoprotein, Coronavirus/immunology Spike Glycoprotein, Coronavirus/genetics Spike Glycoprotein, Coronavirus/chemistry Immunoinformatics Collapse Grants Collapse
5	Author Correction: High-throughput prediction of protein conformational distributions with subsampled AlphaFold2. Nat Commun 2024;15:3089. [PMID: 38600144 PMCID: PMC11006925 DOI: 10.1038/s41467-024-47504-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/12/2024] Open Abstract Collapse Key Words molecular conformation protein structure predictions computational biophysics Collapse MESH Headings Collapse Grants Collapse
6	Hairpin trimer transition state of amyloid fibril. Nat Commun 2024;15:2756. [PMID: 38553453 PMCID: PMC10980705 DOI: 10.1038/s41467-024-46446-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Accepted: 02/28/2024] [Indexed: 04/02/2024] Open Abstract Protein fibril self-assembly is a universal transition implicated in neurodegenerative diseases. Although fibril structure/growth are well characterized, fibril nucleation is poorly understood. Here, we use a computational-experimental approach to resolve fibril nucleation. We show that monomer hairpin content quantified from molecular dynamics simulations is predictive of experimental fibril formation kinetics across a tau motif mutant library. Hairpin trimers are predicted to be fibril transition states; one hairpin spontaneously converts into the cross-beta conformation, templating subsequent fibril growth. We designed a disulfide-linked dimer mimicking the transition state that catalyzes fibril formation, measured by ThT fluorescence and TEM, of wild-type motif - which does not normally fibrillize. A dimer compatible with extended conformations but not the transition-state fails to nucleate fibril at any concentration. Tau repeat domain simulations show how long-range interactions sequester this motif in a mutation-dependent manner. This work implies that different fibril morphologies could arise from disease-dependent hairpin seeding from different loci. Collapse Key Words protein structure predictions computational biophysics intrinsically disordered proteins reaction kinetics and dynamics protein aggregation Collapse MESH Headings Amyloid/metabolism Protein Conformation Molecular Dynamics Simulation Protein Structure, Secondary Amyloid beta-Peptides/metabolism Collapse Grants F31 NS127513 NINDS NIH HHS R35 GM150897 NIGMS NIH HHS RF1 AG076459 NIA NIH HHS NIH MIRA R35GM150897-01 Collapse
7	High-throughput prediction of protein conformational distributions with subsampled AlphaFold2. Nat Commun 2024;15:2464. [PMID: 38538622 PMCID: PMC10973385 DOI: 10.1038/s41467-024-46715-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Accepted: 02/28/2024] [Indexed: 04/12/2024] Open Abstract This paper presents an innovative approach for predicting the relative populations of protein conformations using AlphaFold 2, an AI-powered method that has revolutionized biology by enabling the accurate prediction of protein structures. While AlphaFold 2 has shown exceptional accuracy and speed, it is designed to predict proteins' ground state conformations and is limited in its ability to predict conformational landscapes. Here, we demonstrate how AlphaFold 2 can directly predict the relative populations of different protein conformations by subsampling multiple sequence alignments. We tested our method against nuclear magnetic resonance experiments on two proteins with drastically different amounts of available sequence data, Abl1 kinase and the granulocyte-macrophage colony-stimulating factor, and predicted changes in their relative state populations with more than 80% accuracy. Our subsampling approach worked best when used to qualitatively predict the effects of mutations or evolution on the conformational landscape and well-populated states of proteins. It thus offers a fast and cost-effective way to predict the relative populations of protein conformations at even single-point mutation resolution, making it a useful tool for pharmacology, analysis of experimental results, and predicting evolution. Collapse Key Words molecular conformation protein structure predictions computational biophysics Collapse MESH Headings Protein Conformation Mutation Point Mutation Sequence Alignment Collapse Grants R01 GM144451 NIGMS NIH HHS National Science Foundation (NSF) U.S. Department of Health & Human Services \| NIH \| National Cancer Institute (NCI) Blavatnik Family Foundation Collapse
8	cyp51A mutations, protein modeling, and efflux pump gene expression reveals multifactorial complexity towards understanding Aspergillus section Nigri azole resistance mechanism. Sci Rep 2024;14:6156. [PMID: 38486086 PMCID: PMC10940716 DOI: 10.1038/s41598-024-55237-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 02/21/2024] [Indexed: 03/18/2024] Open Abstract Black Aspergillus species are the most common etiological agents of otomycosis, and pulmonary aspergillosis. However, limited data is available on their antifungal susceptibility profiles and associated resistance mechanisms. Here, we determined the azole susceptibility profiles of black Aspergillus species isolated from the Indian environment and explored the potential resistance mechanisms through cyp51A gene sequencing, protein homology modeling, and expression analysis of selected genes cyp51A, cyp51B, mdr1, and mfs based on their role in imparting resistance against antifungal drugs. In this study, we have isolated a total of 161 black aspergilli isolates from 174 agricultural soil samples. Isolates had variable resistance towards medical azoles; approximately 11.80%, 3.10%, and 1.24% of isolates were resistant to itraconazole (ITC), posaconazole (POS), and voriconazole (VRC), respectively. Further, cyp51A sequence analysis showed that non-synonymous mutations were present in 20 azole-resistant Aspergillus section Nigri and 10 susceptible isolates. However, Cyp51A homology modeling indicated insignificant protein structural variations because of these mutations. Most of the isolates showed the overexpression of mdr1, and mfs genes. Hence, the study concluded that azole-resistance in section Nigri cannot be attributed exclusively to the cyp51A gene mutation or its overexpression. However, overexpression of mdr1 and mfs genes may have a potential role in drug resistance. Collapse Key Words antimicrobials fungal pathogenesis protein structure predictions fungal infection Collapse MESH Headings Antifungal Agents/pharmacology Azoles/pharmacology Aspergillosis/microbiology Fungal Proteins/genetics Fungal Proteins/metabolism Drug Resistance, Fungal/genetics Aspergillus/metabolism Mutation Gene Expression Collapse Grants Collapse
9	Impact of mutations on the stability of SARS-CoV-2 nucleocapsid protein structure. Sci Rep 2024;14:5870. [PMID: 38467657 PMCID: PMC10928099 DOI: 10.1038/s41598-024-55157-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Accepted: 02/21/2024] [Indexed: 03/13/2024] Open Abstract The nucleocapsid (N) protein of SARS-CoV-2 is known to participate in various host cellular processes, including interferon inhibition, RNA interference, apoptosis, and regulation of virus life cycles. Additionally, it has potential as a diagnostic antigen and/or immunogen. Our research focuses on examining structural changes caused by mutations in the N protein. We have modeled the complete tertiary structure of native and mutated forms of the N protein using Alphafold2. Notably, the N protein contains 3 disordered regions. The focus was on investigating the impact of mutations on the stability of the protein's dimeric structure based on binding free energy calculations (MM-PB/GB-SA) and RMSD fluctuations after MD simulations. The results demonstrated that 28 mutations out of 37 selected mutations analyzed, compared with wild-type N protein, resulted in a stable dimeric structure, while 9 mutations led to destabilization. Our results are important to understand the tertiary structure of the N protein dimer of SARS-CoV-2 and the effect of mutations on it, their behavior in the host cell, as well as for the research of other viruses belonging to the same genus additionally, to anticipate potential strategies for addressing this viral illness․. Collapse Key Words computational models molecular modelling protein structure predictions Collapse MESH Headings Humans SARS-CoV-2/genetics SARS-CoV-2/metabolism COVID-19/genetics Nucleocapsid Proteins/metabolism Nucleocapsid/genetics Nucleocapsid/metabolism Mutation Collapse Grants 22AA-1F026 Ministry of Education, Science, Culture and Sport RA, Higher Education and Science Committee 21AG-1F057 Ministry of Education, Science, Culture and Sport RA, Higher Education and Science Committee Collapse
10	CombFold: predicting structures of large protein assemblies using a combinatorial assembly algorithm and AlphaFold2. Nat Methods 2024;21:477-487. [PMID: 38326495 PMCID: PMC10927564 DOI: 10.1038/s41592-024-02174-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Accepted: 01/09/2024] [Indexed: 02/09/2024] Abstract Deep learning models, such as AlphaFold2 and RosettaFold, enable high-accuracy protein structure prediction. However, large protein complexes are still challenging to predict due to their size and the complexity of interactions between multiple subunits. Here we present CombFold, a combinatorial and hierarchical assembly algorithm for predicting structures of large protein complexes utilizing pairwise interactions between subunits predicted by AlphaFold2. CombFold accurately predicted (TM-score >0.7) 72% of the complexes among the top-10 predictions in two datasets of 60 large, asymmetric assemblies. Moreover, the structural coverage of predicted complexes was 20% higher compared to corresponding Protein Data Bank entries. We applied the method on complexes from Complex Portal with known stoichiometry but without known structure and obtained high-confidence predictions. CombFold supports the integration of distance restraints based on crosslinking mass spectrometry and fast enumeration of possible complex stoichiometries. CombFold's high accuracy makes it a promising tool for expanding structural coverage beyond monomeric proteins. Collapse Key Words protein structure predictions data integration computational models Collapse MESH Headings Algorithms Databases, Protein Mass Spectrometry Collapse Grants R01 AI163011 NIAID NIH HHS R01 GM129325 NIGMS NIH HHS Israel Science Foundation (ISF) Collapse
11	Direct prediction of intrinsically disordered protein conformational properties from sequence. Nat Methods 2024;21:465-476. [PMID: 38297184 PMCID: PMC10927563 DOI: 10.1038/s41592-023-02159-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2023] [Accepted: 12/20/2023] [Indexed: 02/02/2024] Abstract Intrinsically disordered regions (IDRs) are ubiquitous across all domains of life and play a range of functional roles. While folded domains are generally well described by a stable three-dimensional structure, IDRs exist in a collection of interconverting states known as an ensemble. This structural heterogeneity means that IDRs are largely absent from the Protein Data Bank, contributing to a lack of computational approaches to predict ensemble conformational properties from sequence. Here we combine rational sequence design, large-scale molecular simulations and deep learning to develop ALBATROSS, a deep-learning model for predicting ensemble dimensions of IDRs, including the radius of gyration, end-to-end distance, polymer-scaling exponent and ensemble asphericity, directly from sequences at a proteome-wide scale. ALBATROSS is lightweight, easy to use and accessible as both a locally installable software package and a point-and-click-style interface via Google Colab notebooks. We first demonstrate the applicability of our predictors by examining the generalizability of sequence-ensemble relationships in IDRs. Then, we leverage the high-throughput nature of ALBATROSS to characterize the sequence-specific biophysical behavior of IDRs within and between proteomes. Collapse Key Words computational biophysics protein structure predictions sequence annotation Collapse MESH Headings Intrinsically Disordered Proteins/chemistry Protein Conformation Polymers Collapse Grants RGP0015/2022 Human Frontier Science Program (HFSP) 2128068 NSF \| BIO \| Division of Molecular and Cellular Biosciences (MCB) 2139839 NSF \| BIO \| Division of Molecular and Cellular Biosciences (MCB) 2213983 NSF \| BIO \| Division of Biological Infrastructure (DBI) Longer Life Foundation (LLF) Milli Sigma Foundation Fellowship (no grant number) Collapse
12	DynamicBind: predicting ligand-specific protein-ligand complex structure with a deep equivariant generative model. Nat Commun 2024;15:1071. [PMID: 38316797 PMCID: PMC10844226 DOI: 10.1038/s41467-024-45461-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Accepted: 01/24/2024] [Indexed: 02/07/2024] Open Abstract While significant advances have been made in predicting static protein structures, the inherent dynamics of proteins, modulated by ligands, are crucial for understanding protein function and facilitating drug discovery. Traditional docking methods, frequently used in studying protein-ligand interactions, typically treat proteins as rigid. While molecular dynamics simulations can propose appropriate protein conformations, they're computationally demanding due to rare transitions between biologically relevant equilibrium states. In this study, we present DynamicBind, a deep learning method that employs equivariant geometric diffusion networks to construct a smooth energy landscape, promoting efficient transitions between different equilibrium states. DynamicBind accurately recovers ligand-specific conformations from unbound protein structures without the need for holo-structures or extensive sampling. Remarkably, it demonstrates state-of-the-art performance in docking and virtual screening benchmarks. Our experiments reveal that DynamicBind can accommodate a wide range of large protein conformational changes and identify cryptic pockets in unseen protein targets. As a result, DynamicBind shows potential in accelerating the development of small molecules for previously undruggable targets and expanding the horizons of computational drug discovery. Collapse Key Words protein structure predictions drug screening virtual drug screening Collapse MESH Headings Ligands Proteins/metabolism Protein Conformation Molecular Dynamics Simulation Drug Discovery Protein Binding Molecular Docking Simulation Collapse Grants Collapse
13	Improving deep learning protein monomer and complex structure prediction using DeepMSA2 with huge metagenomics data. Nat Methods 2024;21:279-289. [PMID: 38167654 PMCID: PMC10864179 DOI: 10.1038/s41592-023-02130-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2023] [Accepted: 11/13/2023] [Indexed: 01/05/2024] Abstract Leveraging iterative alignment search through genomic and metagenome sequence databases, we report the DeepMSA2 pipeline for uniform protein single- and multichain multiple-sequence alignment (MSA) construction. Large-scale benchmarks show that DeepMSA2 MSAs can remarkably increase the accuracy of protein tertiary and quaternary structure predictions compared with current state-of-the-art methods. An integrated pipeline with DeepMSA2 participated in the most recent CASP15 experiment and created complex structural models with considerably higher quality than the AlphaFold2-Multimer server (v.2.2.0). Detailed data analyses show that the major advantage of DeepMSA2 lies in its balanced alignment search and effective model selection, and in the power of integrating huge metagenomics databases. These results demonstrate a new avenue to improve deep learning protein structure prediction through advanced MSA construction and provide additional evidence that optimization of input information to deep learning-based structure prediction methods must be considered with as much care as the design of the predictor itself. Collapse Key Words protein structure predictions computational models machine learning Collapse MESH Headings Deep Learning Computational Biology/methods Proteins/genetics Proteins/chemistry Sequence Alignment Genomics Algorithms Collapse Grants R35 GM136422 NIGMS NIH HHS S10 OD026825 NIH HHS R01 AI134678 NIAID NIH HHS U.S. Department of Health & Human Services \| NIH \| National Institute of General Medical Sciences (NIGMS) Division of Intramural Research, National Institute of Allergy and Infectious Diseases (Division of Intramural Research of the NIAID) National Science Foundation (NSF) Collapse
14	Identification and prioritisation of potential vaccine candidates using subtractive proteomics and designing of a multi-epitope vaccine against Wuchereria bancrofti. Sci Rep 2024;14:1970. [PMID: 38263422 PMCID: PMC10806236 DOI: 10.1038/s41598-024-52457-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Accepted: 01/18/2024] [Indexed: 01/25/2024] Open Abstract This study employed subtractive proteomics and immunoinformatics to analyze the Wuchereria bancrofti proteome and identify potential therapeutic targets, with a focus on designing a vaccine against the parasite species. A comprehensive bioinformatics analysis of the parasite's proteome identified 51 probable therapeutic targets, among which "Kunitz/bovine pancreatic trypsin inhibitor domain-containing protein" was identified as the most promising vaccine candidate. The candidate protein was used to design a multi-epitope vaccine, incorporating B-cell and T-cell epitopes identified through various tools. The vaccine construct underwent extensive analysis of its antigenic, physical, and chemical features, including the determination of secondary and tertiary structures. Docking and molecular dynamics simulations were performed with HLA alleles, Toll-like receptor 4 (TLR4), and TLR3 to assess its potential to elicit the human immune response. Immune simulation analysis confirmed the predicted vaccine's strong binding affinity with immunoglobulins, indicating its potential efficacy in generating an immune response. However, experimental validation and testing of this multi-epitope vaccine construct would be needed to assess its potential against W. bancrofti and even for a broader range of lymphatic filarial infections given the similarities between W. bancrofti and Brugia. Collapse Key Words bioinformatics data acquisition data processing protein structure predictions proteome informatics Collapse MESH Headings Humans Animals Cattle Wuchereria bancrofti Proteome Proteomics Epitopes, T-Lymphocyte Aprotinin Molecular Dynamics Simulation Collapse Grants Collapse
15	From interaction networks to interfaces, scanning intrinsically disordered regions using AlphaFold2. Nat Commun 2024;15:597. [PMID: 38238291 PMCID: PMC10796318 DOI: 10.1038/s41467-023-44288-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Accepted: 12/07/2023] [Indexed: 01/22/2024] Open Abstract The revolution brought about by AlphaFold2 opens promising perspectives to unravel the complexity of protein-protein interaction networks. The analysis of interaction networks obtained from proteomics experiments does not systematically provide the delimitations of the interaction regions. This is of particular concern in the case of interactions mediated by intrinsically disordered regions, in which the interaction site is generally small. Using a dataset of protein-peptide complexes involving intrinsically disordered regions that are non-redundant with the structures used in AlphaFold2 training, we show that when using the full sequences of the proteins, AlphaFold2-Multimer only achieves 40% success rate in identifying the correct site and structure of the interface. By delineating the interaction region into fragments of decreasing size and combining different strategies for integrating evolutionary information, we manage to raise this success rate up to 90%. We obtain similar success rates using a much larger dataset of protein complexes taken from the ELM database. Beyond the correct identification of the interaction site, our study also explores specificity issues. We show the advantages and limitations of using the AlphaFold2 confidence score to discriminate between alternative binding partners, a task that can be particularly challenging in the case of small interaction motifs. Collapse Key Words protein structure predictions protein-protein interaction networks Collapse MESH Headings Proteins/metabolism Protein Interaction Maps Biological Evolution Intrinsically Disordered Proteins/metabolism Protein Binding Collapse Grants ANR-18-CE45-0005 Agence Nationale de la Recherche (French National Research Agency) Collapse
16	Contextualising the developability risk of antibodies with lambda light chains using enhanced therapeutic antibody profiling. Commun Biol 2024;7:62. [PMID: 38191620 PMCID: PMC10774428 DOI: 10.1038/s42003-023-05744-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Accepted: 12/26/2023] [Indexed: 01/10/2024] Open Abstract Antibodies with lambda light chains (λ-antibodies) are generally considered to be less developable than those with kappa light chains (κ-antibodies). Though this hypothesis has not been formally established, it has led to substantial systematic biases in drug discovery pipelines and thus contributed to kappa dominance amongst clinical-stage therapeutics. However, the identification of increasing numbers of epitopes preferentially engaged by λ-antibodies shows there is a functional cost to neglecting to consider them as potential lead candidates. Here, we update our Therapeutic Antibody Profiler (TAP) tool to use the latest data and machine learning-based structure prediction, and apply it to evaluate developability risk profiles for κ-antibodies and λ-antibodies based on their surface physicochemical properties. We find that while human λ-antibodies on average have a higher risk of developability issues than κ-antibodies, a sizeable proportion are assigned lower-risk profiles by TAP and should represent more tractable candidates for therapeutic development. Through a comparative analysis of the low- and high-risk populations, we highlight opportunities for strategic design that TAP suggests would enrich for more developable λ-antibodies. Overall, we provide context to the differing developability of κ- and λ-antibodies, enabling a rational approach to incorporate more diversity into the initial pool of immunotherapeutic candidates. Collapse Key Words protein design applied immunology antibody therapy protein structure predictions biophysical chemistry Collapse MESH Headings Humans Antibodies/therapeutic use Drug Discovery Epitopes Machine Learning Surface Properties Collapse Grants Wellcome Trust RCUK \| Engineering and Physical Sciences Research Council (EPSRC) Boehringer Ingelheim (Boehringer Ingelheim Pharmaceuticals) Collapse
17	Enhancing geometric representations for molecules with equivariant vector-scalar interactive message passing. Nat Commun 2024;15:313. [PMID: 38182565 PMCID: PMC10770089 DOI: 10.1038/s41467-023-43720-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 11/16/2023] [Indexed: 01/07/2024] Open Abstract Geometric deep learning has been revolutionizing the molecular modeling field. Despite the state-of-the-art neural network models are approaching ab initio accuracy for molecular property prediction, their applications, such as drug discovery and molecular dynamics (MD) simulation, have been hindered by insufficient utilization of geometric information and high computational costs. Here we propose an equivariant geometry-enhanced graph neural network called ViSNet, which elegantly extracts geometric features and efficiently models molecular structures with low computational costs. Our proposed ViSNet outperforms state-of-the-art approaches on multiple MD benchmarks, including MD17, revised MD17 and MD22, and achieves excellent chemical property prediction on QM9 and Molecule3D datasets. Furthermore, through a series of simulations and case studies, ViSNet can efficiently explore the conformational space and provide reasonable interpretability to map geometric representations to molecular structures. Collapse Key Words computational biology and bioinformatics chemical biology molecular modelling computational models protein structure predictions Collapse MESH Headings Collapse Grants Collapse
18	AlphaFold predictions are valuable hypotheses and accelerate but do not replace experimental structure determination. Nat Methods 2024;21:110-116. [PMID: 38036854 PMCID: PMC10776388 DOI: 10.1038/s41592-023-02087-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Accepted: 10/11/2023] [Indexed: 12/02/2023] Abstract Artificial intelligence-based protein structure prediction methods such as AlphaFold have revolutionized structural biology. The accuracies of these predictions vary, however, and they do not take into account ligands, covalent modifications or other environmental factors. Here, we evaluate how well AlphaFold predictions can be expected to describe the structure of a protein by comparing predictions directly with experimental crystallographic maps. In many cases, AlphaFold predictions matched experimental maps remarkably closely. In other cases, even very high-confidence predictions differed from experimental maps on a global scale through distortion and domain orientation, and on a local scale in backbone and side-chain conformation. We suggest considering AlphaFold predictions as exceptionally useful hypotheses. We further suggest that it is important to consider the confidence in prediction when interpreting AlphaFold predictions and to carry out experimental structure determination to verify structural details, particularly those that involve interactions not included in the prediction. Collapse Key Words x-ray crystallography protein analysis protein structure predictions Collapse MESH Headings Artificial Intelligence Crystallography Mental Processes Protein Conformation Collapse Grants Wellcome Trust P01 GM063210 NIGMS NIH HHS U.S. Department of Health & Human Services \| NIH \| National Institute of General Medical Sciences (NIGMS) Wellcome Trust (Wellcome) DOE \| LDRD \| Lawrence Berkeley National Laboratory (Berkeley Lab) Collapse
19	Predicting multiple conformations via sequence clustering and AlphaFold2. Nature 2024;625:832-839. [PMID: 37956700 PMCID: PMC10808063 DOI: 10.1038/s41586-023-06832-9] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Accepted: 11/03/2023] [Indexed: 11/15/2023] Abstract AlphaFold2 (ref. 1) has revolutionized structural biology by accurately predicting single structures of proteins. However, a protein's biological function often depends on multiple conformational substates2, and disease-causing point mutations often cause population changes within these substates3,4. We demonstrate that clustering a multiple-sequence alignment by sequence similarity enables AlphaFold2 to sample alternative states of known metamorphic proteins with high confidence. Using this method, named AF-Cluster, we investigated the evolutionary distribution of predicted structures for the metamorphic protein KaiB5 and found that predictions of both conformations were distributed in clusters across the KaiB family. We used nuclear magnetic resonance spectroscopy to confirm an AF-Cluster prediction: a cyanobacteria KaiB variant is stabilized in the opposite state compared with the more widely studied variant. To test AF-Cluster's sensitivity to point mutations, we designed and experimentally verified a set of three mutations predicted to flip KaiB from Rhodobacter sphaeroides from the ground to the fold-switched state. Finally, screening for alternative states in protein families without known fold switching identified a putative alternative state for the oxidoreductase Mpt53 in Mycobacterium tuberculosis. Further development of such bioinformatic methods in tandem with experiments will probably have a considerable impact on predicting protein energy landscapes, essential for illuminating biological function. Collapse Key Words nmr spectroscopy protein structure predictions protein folding Collapse MESH Headings Cluster Analysis Mutation Protein Conformation Proteins/chemistry Proteins/genetics Proteins/metabolism Sequence Alignment Machine Learning Rhodobacter sphaeroides Bacterial Proteins/chemistry Bacterial Proteins/metabolism Protein Folding Collapse Grants P41 GM111135 NIGMS NIH HHS R24 GM141526 NIGMS NIH HHS T32 GM135126 NIGMS NIH HHS Collapse
20	Merizo: a rapid and accurate protein domain segmentation method using invariant point attention. Nat Commun 2023;14:8445. [PMID: 38114456 PMCID: PMC10730818 DOI: 10.1038/s41467-023-43934-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 11/24/2023] [Indexed: 12/21/2023] Open Abstract The AlphaFold Protein Structure Database, containing predictions for over 200 million proteins, has been met with enthusiasm over its potential in enriching structural biological research and beyond. Currently, access to the database is precluded by an urgent need for tools that allow the efficient traversal, discovery, and documentation of its contents. Identifying domain regions in the database is a non-trivial endeavour and doing so will aid our understanding of protein structure and function, while facilitating drug discovery and comparative genomics. Here, we describe a deep learning method for domain segmentation called Merizo, which learns to cluster residues into domains in a bottom-up manner. Merizo is trained on CATH domains and fine-tuned on AlphaFold2 models via self-distillation, enabling it to be applied to both experimental and AlphaFold2 models. As proof of concept, we apply Merizo to the human proteome, identifying 40,818 putative domains that can be matched to CATH representative domains. Collapse Key Words molecular modelling machine learning protein structure predictions Collapse MESH Headings Humans Protein Domains Protein Structure, Tertiary Proteins/genetics Proteins/chemistry Genomics Databases, Protein Collapse Grants BB/T019409/1 RCUK \| Biotechnology and Biological Sciences Research Council (BBSRC) Collapse
21	Accurate prediction of protein assembly structure by combining AlphaFold and symmetrical docking. Nat Commun 2023;14:8283. [PMID: 38092742 PMCID: PMC10719378 DOI: 10.1038/s41467-023-43681-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 11/16/2023] [Indexed: 12/17/2023] Open Abstract AlphaFold can predict the structures of monomeric and multimeric proteins with high accuracy but has a limit on the number of chains and residues it can fold. Here we show that a combination of AlphaFold and all-atom symmetric docking simulations enables highly accurate prediction of the structure of complex symmetrical assemblies. We present a method to predict the structure of complexes with cubic - tetrahedral, octahedral and icosahedral - symmetry from sequence. Focusing on proteins where AlphaFold can make confident predictions on the subunit structure, 27 cubic systems were assembled with a median TM-score of 0.99 and a DockQ score of 0.72. 21 had TM-scores of above 0.9 and were categorized as acceptable- to high-quality according to DockQ. The resulting models are energetically optimized and can be used for detailed studies of intermolecular interactions in higher-order symmetrical assemblies. The results demonstrate how explicit treatment of structural symmetry can significantly expand the size and complexity of AlphaFold predictions. Collapse Key Words protein structure predictions machine learning Collapse MESH Headings Protein Conformation Proteins/metabolism Collapse Grants 771820 EC \| EC Seventh Framework Programm \| FP7 Ideas: European Research Council (FP7-IDEAS-ERC - Specific Programme:Ideas; Implementing the Seventh Framework Programme of the European Community for Research, Technological Development and Demonstration Activities (2007 to 2013)) Collapse
22	Multi-domain and complex protein structure prediction using inter-domain interactions from deep learning. Commun Biol 2023;6:1221. [PMID: 38040847 PMCID: PMC10692239 DOI: 10.1038/s42003-023-05610-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Accepted: 11/20/2023] [Indexed: 12/03/2023] Open Abstract Accurately capturing domain-domain interactions is key to understanding protein function and designing structure-based drugs. Although AlphaFold2 has made a breakthrough on single domain, it should be noted that the structure modeling for multi-domain protein and complex remains a challenge. In this study, we developed a multi-domain and complex structure assembly protocol, named DeepAssembly, based on domain segmentation and single domain modeling algorithms. Firstly, DeepAssembly uses a population-based evolutionary algorithm to assemble multi-domain proteins by inter-domain interactions inferred from a developed deep learning network. Secondly, protein complexes are assembled by means of domains rather than chains using DeepAssembly. Experimental results show that on 219 multi-domain proteins, the average inter-domain distance precision by DeepAssembly is 22.7% higher than that of AlphaFold2. Moreover, DeepAssembly improves accuracy by 13.1% for 164 multi-domain structures with low confidence deposited in AlphaFold database. We apply DeepAssembly for the prediction of 247 heterodimers. We find that DeepAssembly successfully predicts the interface (DockQ ≥ 0.23) for 32.4% of the dimers, suggesting a lighter way to assemble complex structures by treating domains as assembly units and using inter-domain interactions learned from monomer structures. Collapse Key Words protein structure predictions machine learning Collapse MESH Headings Deep Learning Proteins/chemistry Algorithms Collapse Grants This work was supported by the National Key R&D Program of China [2022ZD0115103], the National Nature Science Foundation of China [62173304, 62203389], the Key Project of Zhejiang Provincial Natural Science Foundation of China [LZ20F030002]. Collapse
23	Combined NMR and molecular dynamics conformational filter identifies unambiguously dynamic ensembles of Dengue protease NS2B/NS3pro. Commun Biol 2023;6:1193. [PMID: 38001280 PMCID: PMC10673835 DOI: 10.1038/s42003-023-05584-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Accepted: 11/14/2023] [Indexed: 11/26/2023] Open Abstract The dengue protease NS2B/NS3pro has been reported to adopt either an 'open' or a 'closed' conformation. We have developed a conformational filter that combines NMR with MD simulations to identify conformational ensembles that dominate in solution. Experimental values derived from relaxation parameters for the backbone and methyl side chains were compared with the corresponding back-calculated relaxation parameters of different conformational ensembles obtained from free MD simulations. Our results demonstrate a high prevalence for the 'closed' conformational ensemble while the 'open' conformation is absent, indicating that the latter conformation is most probably due to crystal contacts. Conversely, conformational ensembles in which the positioning of the co-factor NS2B results in a 'partially' open conformation, previously described in both MD simulations and X-ray studies, were identified by our conformational filter. Altogether, we believe that our approach allows for unambiguous identification of true conformational ensembles, an essential step for reliable drug discovery. Collapse Key Words solution-state nmr computational biophysics protein structure predictions Collapse MESH Headings Humans Peptide Hydrolases Serine Endopeptidases/chemistry Molecular Dynamics Simulation Protein Conformation Viral Nonstructural Proteins/chemistry Dengue Collapse Grants P41 GM111135 NIGMS NIH HHS Stiftelsen för Strategisk Forskning (Swedish Foundation for Strategic Research) Russian Science Support Foundation Russian Science Foundation (RSF) Vetenskapsrådet (Swedish Research Council) Swedish Cancer Foundation Collapse
24	How AlphaFold2 shaped the structural coverage of the human transmembrane proteome. Sci Rep 2023;13:20283. [PMID: 37985809 PMCID: PMC10662385 DOI: 10.1038/s41598-023-47204-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 11/10/2023] [Indexed: 11/22/2023] Open Abstract AlphaFold2 (AF2) provides a 3D structure for every known or predicted protein, opening up new prospects for virtually every field in structural biology. However, working with transmembrane protein molecules pose a notorious challenge for scientists, resulting in a limited number of experimentally determined structures. Consequently, algorithms trained on this finite training set also face difficulties. To address this issue, we recently launched the TmAlphaFold database, where predicted AlphaFold2 structures are embedded into the membrane plane and a quality assessment (plausibility of the membrane-embedded structure) is provided for each prediction using geometrical evaluation. In this paper, we analyze how AF2 has improved the structural coverage of membrane proteins compared to earlier years when only experimental structures were available, and high-throughput structure prediction was greatly limited. We also evaluate how AF2 can be used to search for (distant) homologs in highly diverse protein families. By combining quality assessment and homology search, we can pinpoint protein families where AF2 accuracy is still limited, and experimental structure determination would be desirable. Collapse Key Words protein structure predictions molecular modelling Collapse MESH Headings Humans Proteome Furylfuramide Membrane Proteins Algorithms Databases, Factual Collapse Grants K132522 Ministry of Innovation and Technology of Hungary 101028908 European Union's Horizon 2020 European Union’s Horizon 2020 Collapse
25	Enhancing alphafold-multimer-based protein complex structure prediction with MULTICOM in CASP15. Commun Biol 2023;6:1140. [PMID: 37949999 PMCID: PMC10638423 DOI: 10.1038/s42003-023-05525-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Accepted: 10/30/2023] [Indexed: 11/12/2023] Open Abstract To enhance the AlphaFold-Multimer-based protein complex structure prediction, we developed a quaternary structure prediction system (MULTICOM) to improve the input fed to AlphaFold-Multimer and evaluate and refine its outputs. MULTICOM samples diverse multiple sequence alignments (MSAs) and templates for AlphaFold-Multimer to generate structural predictions by using both traditional sequence alignments and Foldseek-based structure alignments, ranks structural predictions through multiple complementary metrics, and refines the structural predictions via a Foldseek structure alignment-based refinement method. The MULTICOM system with different implementations was blindly tested in the assembly structure prediction in the 15th Critical Assessment of Techniques for Protein Structure Prediction (CASP15) in 2022 as both server and human predictors. MULTICOM_qa ranked 3rd among 26 CASP15 server predictors and MULTICOM_human ranked 7th among 87 CASP15 server and human predictors. The average TM-score of the first predictions submitted by MULTICOM_qa for CASP15 assembly targets is ~0.76, 5.3% higher than ~0.72 of the standard AlphaFold-Multimer. The average TM-score of the best of top 5 predictions submitted by MULTICOM_qa is ~0.80, about 8% higher than ~0.74 of the standard AlphaFold-Multimer. Moreover, the Foldseek Structure Alignment-based Multimer structure Generation (FSAMG) method outperforms the widely used sequence alignment-based multimer structure generation. Collapse Key Words protein structure predictions computational models Collapse MESH Headings Humans Proteins/chemistry Sequence Alignment Benchmarking Collapse Grants R01 GM093123 NIGMS NIH HHS R01 GM146340 NIGMS NIH HHS U.S. Department of Health & Human Services \| National Institutes of Health (NIH) Collapse
26	Engineering and design of promising T-cell-based multi-epitope vaccine candidates against leishmaniasis. Sci Rep 2023;13:19421. [PMID: 37940672 PMCID: PMC10632461 DOI: 10.1038/s41598-023-46408-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Accepted: 10/31/2023] [Indexed: 11/10/2023] Open Abstract Cutaneous leishmaniasis (CL) is a very common parasitic infection in subtropical areas worldwide. Throughout decades, there have been challenges in vaccine design and vaccination against CL. The present study introduced novel T-cell-based vaccine candidates containing IFN-γ Inducing epitopic fragments from Leishmania major (L. major) glycoprotein 46 (gp46), cathepsin L-like and B-like proteases, histone H2A, glucose-regulated protein 78 (grp78) and stress-inducible protein 1 (STI-1). For this aim, top-ranked human leukocyte antigen (HLA)-specific, IFN-γ Inducing, antigenic, CD4+ and CD8+ binders were highlighted. Four vaccine candidates were generated using different spacers (AAY, GPGPG, GDGDG) and adjuvants (RS-09 peptide, human IFN-γ, a combination of both, Mycobacterium tuberculosis Resuscitation promoting factor E (RpfE)). Based on the immune simulation profile, those with RS-09 peptide (Leish-App) and RpfE (Leish-Rpf) elicited robust immune responses and their tertiary structure were further refined. Also, molecular docking of the selected vaccine models with the human toll-like receptor 4 showed proper interactions, particularly for Leish-App, for which molecular dynamics simulations showed a stable connection with TLR-4. Upon codon optimization, both models were finally ligated into the pET28a( +) vector. In conclusion, two potent multi-epitope vaccine candidates were designed against CL and evaluated using comprehensive in silico methods, while further wet experiments are, also, recommended. Collapse Key Words protein design protein structure predictions parasite host response Collapse MESH Headings Humans Epitopes, T-Lymphocyte Leishmaniasis, Visceral/parasitology Molecular Docking Simulation T-Lymphocytes Vaccines Leishmaniasis, Cutaneous Interferon-gamma Computational Biology Vaccines, Subunit Epitopes, B-Lymphocyte Collapse Grants Collapse
27	Topological links in predicted protein complex structures reveal limitations of AlphaFold. Commun Biol 2023;6:1098. [PMID: 37898666 PMCID: PMC10613300 DOI: 10.1038/s42003-023-05489-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Accepted: 10/19/2023] [Indexed: 10/30/2023] Open Abstract AlphaFold is making great progress in protein structure prediction, not only for single-chain proteins but also for multi-chain protein complexes. When using AlphaFold-Multimer to predict protein‒protein complexes, we observed some unusual structures in which chains are looped around each other to form topologically intertwining links at the interface. Based on physical principles, such topological links should generally not exist in native protein complex structures unless covalent modifications of residues are involved. Although it is well known and has been well studied that protein structures may have topologically complex shapes such as knots and links, existing methods are hampered by the chain closure problem and show poor performance in identifying topologically linked structures in protein‒protein complexes. Therefore, we address the chain closure problem by using sliding windows from a local perspective and propose an algorithm to measure the topological-geometric features that can be used to identify topologically linked structures. An application of the method to AlphaFold-Multimer-predicted protein complex structures finds that approximately 1.72% of the predicted structures contain topological links. The method presented in this work will facilitate the computational study of protein‒protein interactions and help further improve the structural prediction of multi-chain protein complexes. Collapse Key Words protein structure predictions computational biophysics Collapse MESH Headings Proteins/metabolism Algorithms Collapse Grants National Natural Science Foundation of China (National Science Foundation of China) Natural Science Foundation of Zhejiang Province (Zhejiang Provincial Natural Science Foundation) “Pioneer” and “Leading Goose” R&D Program of Zhejiang (2023C03109) Westlake Center for Genome Editing (20200000A992210/001) Collapse
28	Computational completion of the Aurora interaction region of N-Myc in the Aurora a kinase complex. Sci Rep 2023;13:18399. [PMID: 37884585 PMCID: PMC10603048 DOI: 10.1038/s41598-023-45272-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 10/17/2023] [Indexed: 10/28/2023] Open Abstract Inhibiting protein-protein interactions of the Myc family is a viable pharmacological strategy for modulation of the levels of Myc oncoproteins in cancer. Aurora A kinase (AurA) and N-Myc interaction is one of the most attractive targets of this strategy because formation of this complex blocks proteasomal degradation of N-Myc in neuroblastoma. Two crystallization studies have captured this complex (PDB IDs: 5g1x, 7ztl), partially resolving the AurA interaction region (AIR) of N-Myc. Prompted by the missing N-Myc fragment in these crystal structures, we modeled the complete structure between AurA and N-Myc, and comprehensively analyzed how the incomplete and complete N-Myc behave in complex by molecular dynamics simulations. Molecular dynamics simulations of the incomplete PDB complex (5g1x) repeatedly showed partial dissociation of the short N-Myc fragment (61-89) from the kinase. The missing N-Myc (19-60) fragment was modeled utilizing the N-terminal lobe of AurA as the protein-protein interaction surface, wherein TPX2, a well-known partner of AurA, also binds. Binding free energy calculations along with flexibility analysis confirmed that the complete AIR of N-Myc stabilizes the complex, accentuating the N-terminal lobe of AurA as a binding site for the missing N-Myc fragment (19-60). We further generated additional models consisting of only the missing N-Myc (19-60), and the fused form of TPX2 (7-43) and N-Myc (61-89). These partners also formed more stable interactions with the N-terminal lobe of AurA than did the incomplete N-Myc fragment (61-89) in the 5g1x complex. Altogether, this study provides structural insights into the involvement of the N-terminus of the AIR of N-Myc and the N-terminal lobe of AurA in formation of a stable complex, reflecting its potential for effective targeting of N-Myc. Collapse Key Words computational biology and bioinformatics protein structure predictions oncogenes Collapse MESH Headings Humans Aurora Kinase A/chemistry Binding Sites Epilepsy Molecular Dynamics Simulation Neuroblastoma N-Myc Proto-Oncogene Protein Collapse Grants Türkiye Bilimsel ve Teknolojik Araştırma Kurumu Collapse
29	HLA3DB: comprehensive annotation of peptide/HLA complexes enables blind structure prediction of T cell epitopes. Nat Commun 2023;14:6349. [PMID: 37816745 PMCID: PMC10564892 DOI: 10.1038/s41467-023-42163-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 09/29/2023] [Indexed: 10/12/2023] Open Abstract The class I proteins of the major histocompatibility complex (MHC-I) display epitopic peptides derived from endogenous proteins on the cell surface for immune surveillance. Accurate modeling of peptides bound to the human MHC, HLA, has been mired by conformational diversity of the central peptide residues, which are critical for recognition by T cell receptors. Here, analysis of X-ray crystal structures within our curated database (HLA3DB) shows that pHLA complexes encompassing multiple HLA allotypes present a discrete set of peptide backbone conformations. Leveraging these backbones, we employ a regression model trained on terms of a physically relevant energy function to develop a comparative modeling approach for nonamer pHLA structures named RepPred. Our method outperforms the top pHLA modeling approach by up to 19% in structural accuracy, and consistently predicts blind targets not included in our training set. Insights from our work may be applied towards predicting antigen immunogenicity, and receptor cross-reactivity. Collapse Key Words molecular modelling protein structure predictions Collapse MESH Headings Humans Epitopes, T-Lymphocyte Peptides/chemistry Receptors, Antigen, T-Cell Histocompatibility Antigens Histocompatibility Antigens Class I/metabolism Collapse Grants U01 DK112217 NIDDK NIH HHS R35 GM125034 NIGMS NIH HHS R01 AI143997 NIAID NIH HHS CGCATF-2021/100014 Cancer Research UK OT2 CA278687 NCI NIH HHS U.S. Department of Health & Human Services \| NIH \| National Institute of General Medical Sciences (NIGMS) Division of Intramural Research, National Institute of Allergy and Infectious Diseases (Division of Intramural Research of the NIAID) Collapse
30	Enhanced antibody-antigen structure prediction from molecular docking using AlphaFold2. Sci Rep 2023;13:15107. [PMID: 37704686 PMCID: PMC10499836 DOI: 10.1038/s41598-023-42090-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Accepted: 09/05/2023] [Indexed: 09/15/2023] Open Abstract Predicting the structure of antibody-antigen complexes has tremendous value in biomedical research but unfortunately suffers from a poor performance in real-life applications. AlphaFold2 (AF2) has provided renewed hope for improvements in the field of protein-protein docking but has shown limited success against antibody-antigen complexes due to the lack of co-evolutionary constraints. In this study, we used physics-based protein docking methods for building decoy sets consisting of low-energy docking solutions that were either geometrically close to the native structure (positives) or not (negatives). The docking models were then fed into AF2 to assess their confidence with a novel composite score based on normalized pLDDT and pTMscore metrics after AF2 structural refinement. We show benefits of the AF2 composite score for rescoring docking poses both in terms of (1) classification of positives/negatives and of (2) success rates with particular emphasis on early enrichment. Docking models of at least medium quality present in the decoy set, but not necessarily highly ranked by docking methods, benefitted most from AF2 rescoring by experiencing large advances towards the top of the reranked list of models. These improvements, obtained without any calibration or novel methodologies, led to a notable level of performance in antibody-antigen unbound docking that was never achieved previously. Collapse Key Words protein structure predictions machine learning Collapse MESH Headings Molecular Docking Simulation Furylfuramide Antigen-Antibody Complex Benchmarking Biological Evolution Collapse Grants Collapse
31	The net electrostatic potential and hydration of ABCG2 affect substrate transport. Nat Commun 2023;14:5035. [PMID: 37596258 PMCID: PMC10439158 DOI: 10.1038/s41467-023-40610-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Accepted: 08/03/2023] [Indexed: 08/20/2023] Open Abstract ABCG2 is a medically important ATP-binding cassette transporter with crucial roles in the absorption and distribution of chemically-diverse toxins and drugs, reducing the cellular accumulation of chemotherapeutic drugs to facilitate multidrug resistance in cancer. ABCG2's capacity to transport both hydrophilic and hydrophobic compounds is not well understood. Here we assess the molecular basis for substrate discrimination by the binding pocket. Substitution of a phylogenetically-conserved polar residue, N436, to alanine in the binding pocket of human ABCG2 permits only hydrophobic substrate transport, revealing the unique role of N436 as a discriminator. Molecular dynamics simulations show that this alanine substitution alters the electrostatic potential of the binding pocket favoring hydration of the transport pore. This change affects the contact with substrates and inhibitors, abrogating hydrophilic compound transport while retaining the transport of hydrophobic compounds. The N436 residue is also required for optimal transport inhibition of ABCG2, as many inhibitors are functionally impaired by this ABCG2 mutation. Overall, these findings have biomedical implications, broadly extending our understanding of substrate and inhibitor interactions. Collapse Key Words membrane proteins protein structure predictions permeation and transport computational biophysics Collapse MESH Headings Humans Static Electricity ATP-Binding Cassette Transporters Alanine Inhibition, Psychological Molecular Dynamics Simulation ATP Binding Cassette Transporter, Subfamily G, Member 2/genetics Neoplasm Proteins/genetics Collapse Grants P01 CA096832 NCI NIH HHS P30 CA021765 NCI NIH HHS R01 CA194057 NCI NIH HHS R01 CA194206 NCI NIH HHS American Lebanese Syrian Associated Charities (ALSAC) U.S. Department of Health & Human Services \| NIH \| NCI \| Division of Cancer Epidemiology and Genetics, National Cancer Institute (National Cancer Institute Division of Cancer Epidemiology and Genetics) Collapse
32	Deep transfer learning for inter-chain contact predictions of transmembrane protein complexes. Nat Commun 2023;14:4935. [PMID: 37582780 PMCID: PMC10427616 DOI: 10.1038/s41467-023-40426-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 07/21/2023] [Indexed: 08/17/2023] Open Abstract Membrane proteins are encoded by approximately a quarter of human genes. Inter-chain residue-residue contact information is important for structure prediction of membrane protein complexes and valuable for understanding their molecular mechanism. Although many deep learning methods have been proposed to predict the intra-protein contacts or helix-helix interactions in membrane proteins, it is still challenging to accurately predict their inter-chain contacts due to the limited number of transmembrane proteins. Addressing the challenge, here we develop a deep transfer learning method for predicting inter-chain contacts of transmembrane protein complexes, named DeepTMP, by taking advantage of the knowledge pre-trained from a large data set of non-transmembrane proteins. DeepTMP utilizes a geometric triangle-aware module to capture the correct inter-chain interaction from the coevolution information generated by protein language models. DeepTMP is extensively evaluated on a test set of 52 self-associated transmembrane protein complexes, and compared with state-of-the-art methods including DeepHomo2.0, CDPred, GLINTER, DeepHomo, and DNCON2_Inter. It is shown that DeepTMP considerably improves the precision of inter-chain contact prediction and outperforms the existing approaches in both accuracy and robustness. Collapse Key Words machine learning protein function predictions protein structure predictions structural biology Collapse MESH Headings Humans Membrane Proteins/genetics Membrane Proteins/chemistry Receptors, G-Protein-Coupled Machine Learning Computational Biology/methods Algorithms Collapse Grants National Natural Science Foundation of China (National Science Foundation of China) Collapse
33	Targeted cross-linker delivery for the in situ mapping of protein conformations and interactions in mitochondria. Nat Commun 2023;14:3882. [PMID: 37391416 PMCID: PMC10313818 DOI: 10.1038/s41467-023-39485-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Accepted: 06/15/2023] [Indexed: 07/02/2023] Open Abstract Current methods for intracellular protein analysis mostly require the separation of specific organelles or changes to the intracellular environment. However, the functions of proteins are determined by their native microenvironment as they usually form complexes with ions, nucleic acids, and other proteins. Here, we show a method for in situ cross-linking and analysis of mitochondrial proteins in living cells. By using the poly(lactic-co-glycolic acid) (PLGA) nanoparticles functionalized with dimethyldioctadecylammonium bromide (DDAB) to deliver protein cross-linkers into mitochondria, we subsequently analyze the cross-linked proteins using mass spectrometry. With this method, we identify a total of 74 pairs of protein-protein interactions that do not exist in the STRING database. Interestingly, our data on mitochondrial respiratory chain proteins ( ~ 94%) are also consistent with the experimental or predicted structural analysis of these proteins. Thus, we provide a promising technology platform for in situ defining protein analysis in cellular organelles under their native microenvironment. Collapse Key Words protein-protein interaction networks cell delivery mitochondrial proteins protein structure predictions proteomics Collapse MESH Headings Mitochondria Protein Conformation Mitochondrial Membranes Databases, Factual Glycols Collapse Grants National Natural Science Foundation of China (National Science Foundation of China) The authors acknowledge support from the National Key R&D Program of China (grant nos. 2018YFA0507703), National Natural Science Foundation of China (grant nos. 21874131, 21991082, 32088101), CAS Youth Innovation Promotion Association (grant nos. Y2021058). CAS Youth Innovation Promotion Association (Y2021058) Collapse
34	Investigating the novel-binding site of RPA2 on Menin and predicting the effect of point mutation of Menin through protein-protein interactions. Sci Rep 2023;13:9337. [PMID: 37291166 PMCID: PMC10250348 DOI: 10.1038/s41598-023-35599-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Accepted: 05/20/2023] [Indexed: 06/10/2023] Open Abstract Protein-protein interactions (PPIs) play a critical role in all biological processes. Menin is tumor suppressor protein, mutated in multiple endocrine neoplasia type 1 syndrome and has been shown to interact with multiple transcription factors including (RPA2) subunit of replication protein A (RPA). RPA2, heterotrimeric protein required for DNA repair, recombination and replication. However, it's still remains unclear the specific amino acid residues that have been involved in Menin-RPA2 interaction. Thus, accurately predicting the specific amino acid involved in interaction and effects of MEN1 mutations on biological systems is of great interests. The experimental approaches for identifying amino acids in menin-RPA2 interactions are expensive, time-consuming, and challenging. This study leverages computational tools, free energy decomposition and configurational entropy scheme to annotate the menin-RPA2 interaction and effect on menin point mutation, thereby proposing a viable model of menin-RPA2 interaction. The menin-RPA2 interaction pattern was calculated on the basis of different 3D structures of menin and RPA2 complexes, constructed using homology modeling and docking strategy, generating three best-fit models: Model 8 (- 74.89 kJ/mol), Model 28 (- 92.04 kJ/mol) and Model 9 (- 100.4 kJ/mol). The molecular dynamic (MD) was performed for 200 ns and binding free energies and energy decomposition analysis were calculated using Molecular Mechanics Poisson-Boltzmann Surface Area (MM/PBSA) in GROMACS. From binding free energy change, model 8 of Menin-RPA2 exhibited most negative binding energy of - 205.624 kJ/mol, followed by model 28 of Menin-RPA2 with - 177.382 kJ/mol. After S606F point mutation in Menin, increase of BFE (ΔGbind) by - 34.09 kJ/mol in Model 8 of mutant Menin-RPA2 occurs. Interestingly, we found a significant reduction of BFE (ΔGbind) and configurational entropy by - 97.54 kJ/mol and - 2618 kJ/mol in mutant model 28 as compared the o wild type. Collectively, this is the first study to highlight the configurational entropy of protein-protein interactions thereby strengthening the prediction of two significant important interaction sites in menin for the binding of RPA2. These predicted sites could be vulnerable for structural alternation in terms of binding free energy and configurational entropy after missense mutation in menin. Collapse Key Words computational models computational platforms and environments machine learning protein design protein structure predictions cancer computational biology and bioinformatics endocrinology oncology Collapse MESH Headings Humans Point Mutation Mutation Transcription Factors/genetics Binding Sites Multiple Endocrine Neoplasia Type 1 Amino Acids/genetics Replication Protein A/genetics Collapse Grants Indian Council of Medical Research Collapse
35	Sequence-structure-function relationships in the microbial protein universe. Nat Commun 2023;14:2351. [PMID: 37100781 PMCID: PMC10133388 DOI: 10.1038/s41467-023-37896-w] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Accepted: 04/05/2023] [Indexed: 04/28/2023] Open Abstract For the past half-century, structural biologists relied on the notion that similar protein sequences give rise to similar structures and functions. While this assumption has driven research to explore certain parts of the protein universe, it disregards spaces that don't rely on this assumption. Here we explore areas of the protein universe where similar protein functions can be achieved by different sequences and different structures. We predict ~200,000 structures for diverse protein sequences from 1,003 representative genomes across the microbial tree of life and annotate them functionally on a per-residue basis. Structure prediction is accomplished using the World Community Grid, a large-scale citizen science initiative. The resulting database of structural models is complementary to the AlphaFold database, with regards to domains of life as well as sequence diversity and sequence length. We identify 148 novel folds and describe examples where we map specific functions to structural motifs. We also show that the structural space is continuous and largely saturated, highlighting the need for a shift in focus across all branches of biology, from obtaining structures to putting them into context and from sequence-based to sequence-structure-function based meta-omics analyses. Collapse Key Words protein structure predictions molecular modelling proteins Collapse MESH Headings Proteins/metabolism Amino Acid Sequence Structure-Activity Relationship Databases, Protein Protein Folding Collapse Grants P30 DK043351 NIDDK NIH HHS Collapse
36	Fast, accurate antibody structure prediction from deep learning on massive set of natural antibodies. Nat Commun 2023;14:2389. [PMID: 37185622 PMCID: PMC10129313 DOI: 10.1038/s41467-023-38063-x] [Citation(s) in RCA: 36] [Impact Index Per Article: 36.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Accepted: 04/14/2023] [Indexed: 05/17/2023] Open Abstract Antibodies have the capacity to bind a diverse set of antigens, and they have become critical therapeutics and diagnostic molecules. The binding of antibodies is facilitated by a set of six hypervariable loops that are diversified through genetic recombination and mutation. Even with recent advances, accurate structural prediction of these loops remains a challenge. Here, we present IgFold, a fast deep learning method for antibody structure prediction. IgFold consists of a pre-trained language model trained on 558 million natural antibody sequences followed by graph networks that directly predict backbone atom coordinates. IgFold predicts structures of similar or better quality than alternative methods (including AlphaFold) in significantly less time (under 25 s). Accurate structure prediction on this timescale makes possible avenues of investigation that were previously infeasible. As a demonstration of IgFold's capabilities, we predicted structures for 1.4 million paired antibody sequences, providing structural insights to 500-fold more antibodies than have experimentally determined structures. Collapse Key Words protein structure predictions molecular modelling protein databases machine learning Collapse MESH Headings Protein Conformation Deep Learning Antibodies/chemistry Complementarity Determining Regions/chemistry Antigens Collapse Grants R35 GM141881 NIGMS NIH HHS R01 GM078221 NIGMS NIH HHS U.S. Department of Health & Human Services \| National Institutes of Health (NIH) Collapse
37	Does AlphaFold2 model proteins' intracellular conformations? An experimental test using cross-linking mass spectrometry of endogenous ciliary proteins. Commun Biol 2023;6:421. [PMID: 37061613 PMCID: PMC10105775 DOI: 10.1038/s42003-023-04773-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Accepted: 03/28/2023] [Indexed: 04/17/2023] Open Abstract A major goal in structural biology is to understand protein assemblies in their biologically relevant states. Here, we investigate whether AlphaFold2 structure predictions match native protein conformations. We chemically cross-linked proteins in situ within intact Tetrahymena thermophila cilia and native ciliary extracts, identifying 1,225 intramolecular cross-links within the 100 best-sampled proteins, providing a benchmark of distance restraints obeyed by proteins in their native assemblies. The corresponding structure predictions were highly concordant, positioning 86.2% of cross-linked residues within Cɑ-to-Cɑ distances of 30 Å, consistent with the cross-linker length. 43% of proteins showed no violations. Most inconsistencies occurred in low-confidence regions or between domains. Overall, AlphaFold2 predictions with lower predicted aligned error corresponded to more correct native structures. However, we observe cases where rigid body domains are oriented incorrectly, as for ciliary protein BBC118, suggesting that combining structure prediction with experimental information will better reveal biologically relevant conformations. Collapse Key Words molecular modelling proteins biophysics proteomics protein structure predictions Collapse MESH Headings Proteins/chemistry Protein Conformation Mass Spectrometry/methods Collapse Grants R01 HD085901 NICHD NIH HHS R35 GM122480 NIGMS NIH HHS R35 GM138348 NIGMS NIH HHS U.S. Department of Health & Human Services \| NIH \| National Institute of General Medical Sciences (NIGMS) Welch Foundation Cancer Prevention and Research Institute of Texas (Cancer Prevention Research Institute of Texas) U.S. Department of Health & Human Services \| NIH \| Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) Collapse
38	Accurate prediction by AlphaFold2 for ligand binding in a reductive dehalogenase and implications for PFAS (per- and polyfluoroalkyl substance) biodegradation. Sci Rep 2023;13:4082. [PMID: 36906658 PMCID: PMC10008544 DOI: 10.1038/s41598-023-30310-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Accepted: 02/21/2023] [Indexed: 03/13/2023] Open Abstract Despite the success of AlphaFold2 (AF2), it is unclear how AF2 models accommodate for ligand binding. Here, we start with a protein sequence from Acidimicrobiaceae TMED77 (T7RdhA) with potential for catalyzing the degradation of per- and polyfluoroalkyl substances (PFASs). AF2 models and experiments identified T7RdhA as a corrinoid iron-sulfur protein (CoFeSP) which uses a norpseudo-cobalamin (BVQ) cofactor and two Fe₄S₄ iron-sulfur clusters for catalysis. Docking and molecular dynamics simulations suggest that T7RdhA uses perfluorooctanoic acetate (PFOA) as a substrate, supporting the reported defluorination activity of its homolog, A6RdhA. We showed that AF2 provides processual (dynamic) predictions for the binding pockets of ligands (cofactors and/or substrates). Because the pLDDT scores provided by AF2 reflect the protein native states in complex with ligands as the evolutionary constraints, the Evoformer network of AF2 predicts protein structures and residue flexibility in complex with the ligands, i.e., in their native states. Therefore, an apo-protein predicted by AF2 is actually a holo-protein awaiting ligands. Collapse Key Words protein analysis protein folding protein function predictions protein structure predictions Collapse MESH Headings Collapse Grants Collapse
39	Effect of Fc core fucosylation and light chain isotype on IgG1 flexibility. Commun Biol 2023;6:237. [PMID: 36869088 PMCID: PMC9982779 DOI: 10.1038/s42003-023-04622-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Accepted: 02/21/2023] [Indexed: 03/05/2023] Open Abstract N-glycosylation plays a key role in modulating the bioactivity of monoclonal antibodies (mAbs), as well as the light chain (LC) isotype can influence their physicochemical properties. However, investigating the impact of such features on mAbs conformational behavior is a big challenge, due to the very high flexibility of these biomolecules. In this work we investigate, by accelerated molecular dynamics (aMD), the conformational behavior of two commercial immunoglobulins G1 (IgG1), representative of κ and λ LCs antibodies, in both their fucosylated and afucosylated forms. Our results show, through the identification of a stable conformation, how the combination of fucosylation and LC isotype modulates the hinge behavior, the Fc conformation and the position of the glycan chains, all factors potentially affecting the binding to the FcγRs. This work also represents a technological enhancement in the conformational exploration of mAbs, making aMD a suitable approach to clarify experimental results. Collapse Key Words protein function predictions protein structure predictions Collapse MESH Headings Glycosylation Immunoglobulin G Antibodies, Monoclonal Technology Collapse Grants 2018–2022 Ministero dell'Istruzione, dell'Università e della Ricerca (Ministry of Education, University and Research) Funder: Fondazione Invernizzi Grant Reference Number: LIB_FOND_COVID_19_01 project Collapse
40	In silico design of a polypeptide as a vaccine candidate against ascariasis. Sci Rep 2023;13:3504. [PMID: 36864139 PMCID: PMC9981566 DOI: 10.1038/s41598-023-30445-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Accepted: 02/23/2023] [Indexed: 03/04/2023] Open Abstract Ascariasis is the most prevalent zoonotic helminthic disease worldwide, and is responsible for nutritional deficiencies, particularly hindering the physical and neurological development of children. The appearance of anthelmintic resistance in Ascaris is a risk for the target of eliminating ascariasis as a public health problem by 2030 set by the World Health Organisation. The development of a vaccine could be key to achieving this target. Here we have applied an in silico approach to design a multi-epitope polypeptide that contains T-cell and B-cell epitopes of reported novel potential vaccination targets, alongside epitopes from established vaccination candidates. An artificial toll-like receptor-4 (TLR4) adjuvant (RS09) was added to improve immunogenicity. The constructed peptide was found to be non-allergic, non-toxic, with adequate antigenic and physicochemical characteristics, such as solubility and potential expression in Escherichia coli. A tertiary structure of the polypeptide was used to predict the presence of discontinuous B-cell epitopes and to confirm the molecular binding stability with TLR2 and TLR4 molecules. Immune simulations predicted an increase in B-cell and T-cell immune response after injection. This polypeptide can now be validated experimentally and compared to other vaccine candidates to assess its possible impact in human health. Collapse Key Words protein design protein structure predictions peptide vaccines parasitic infection Collapse MESH Headings Collapse Grants Collapse
41	Structural details of a Class B GPCR-arrestin complex revealed by genetically encoded crosslinkers in living cells. Nat Commun 2023;14:1151. [PMID: 36859440 PMCID: PMC9977954 DOI: 10.1038/s41467-023-36797-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Accepted: 02/16/2023] [Indexed: 03/03/2023] Open Abstract Understanding the molecular basis of arrestin-mediated regulation of GPCRs is critical for deciphering signaling mechanisms and designing functional selectivity. However, structural studies of GPCR-arrestin complexes are hampered by their highly dynamic nature. Here, we dissect the interaction of arrestin-2 (arr2) with the secretin-like parathyroid hormone 1 receptor PTH1R using genetically encoded crosslinking amino acids in live cells. We identify 136 intermolecular proximity points that guide the construction of energy-optimized molecular models for the PTH1R-arr2 complex. Our data reveal flexible receptor elements missing in existing structures, including intracellular loop 3 and the proximal C-tail, and suggest a functional role of a hitherto overlooked positively charged region at the arrestin N-edge. Unbiased MD simulations highlight the stability and dynamic nature of the complex. Our integrative approach yields structural insights into protein-protein complexes in a biologically relevant live-cell environment and provides information inaccessible to classical structural methods, while also revealing the dynamics of the system. Collapse Key Words g protein-coupled receptors molecular modelling computational biophysics protein structure predictions Collapse MESH Headings Collapse Grants Collapse
42	Evaluating native-like structures of RNA-protein complexes through the deep learning method. Nat Commun 2023;14:1060. [PMID: 36828844 PMCID: PMC9958188 DOI: 10.1038/s41467-023-36720-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Accepted: 02/14/2023] [Indexed: 02/26/2023] Open Abstract RNA-protein complexes underlie numerous cellular processes, including basic translation and gene regulation. The high-resolution structure determination of the RNA-protein complexes is essential for elucidating their functions. Therefore, computational methods capable of identifying the native-like RNA-protein structures are needed. To address this challenge, we thus develop DRPScore, a deep-learning-based approach for identifying native-like RNA-protein structures. DRPScore is tested on representative sets of RNA-protein complexes with various degrees of binding-induced conformation change ranging from fully rigid docking (bound-bound) to fully flexible docking (unbound-unbound). Out of the top 20 predictions, DRPScore selects native-like structures with a success rate of 91.67% on the testing set of bound RNA-protein complexes and 56.14% on the unbound complexes. DRPScore consistently outperforms existing methods with a roughly 10.53-15.79% improvement, even for the most difficult unbound cases. Furthermore, DRPScore significantly improves the accuracy of the native interface interaction predictions. DRPScore should be broadly useful for modeling and designing RNA-protein complexes. Collapse Key Words rna computational biology and bioinformatics molecular modelling protein structure predictions machine learning Collapse MESH Headings Deep Learning Protein Binding Models, Molecular Proteins/metabolism RNA/metabolism Protein Conformation Molecular Docking Simulation Algorithms Collapse Grants National Natural Science Foundation of China (National Science Foundation of China) Fundamental Research Funds for the Central Universities CCNU22QN004 Collapse
43	Computational analysis of the sequence-structure relation in SARS-CoV-2 spike protein using protein contact networks. Sci Rep 2023;13:2837. [PMID: 36808182 PMCID: PMC9936485 DOI: 10.1038/s41598-023-30052-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Accepted: 02/15/2023] [Indexed: 02/19/2023] Open Abstract The structure of proteins impacts directly on the function they perform. Mutations in the primary sequence can provoke structural changes with consequent modification of functional properties. SARS-CoV-2 proteins have been extensively studied during the pandemic. This wide dataset, related to sequence and structure, has enabled joint sequence-structure analysis. In this work, we focus on the SARS-CoV-2 S (Spike) protein and the relations between sequence mutations and structure variations, in order to shed light on the structural changes stemming from the position of mutated amino acid residues in three different SARS-CoV-2 strains. We propose the use of protein contact network (PCN) formalism to: (i) obtain a global metric space and compare various molecular entities, (ii) give a structural explanation of the observed phenotype, and (iii) provide context dependent descriptors of single mutations. PCNs have been used to compare sequence and structure of the Alpha, Delta, and Omicron SARS-CoV-2 variants, and we found that omicron has a unique mutational pattern leading to different structural consequences from mutations of other strains. The non-random distribution of changes in network centrality along the chain has allowed to shed light on the structural (and functional) consequences of mutations. Collapse Key Words computational models network topology protein structure predictions proteins Collapse MESH Headings Collapse Grants Collapse
44	AlphaFold2 reveals commonalities and novelties in protein structure space for 21 model organisms. Commun Biol 2023;6:160. [PMID: 36755055 PMCID: PMC9908985 DOI: 10.1038/s42003-023-04488-9] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Accepted: 01/16/2023] [Indexed: 02/10/2023] Open Abstract Deep-learning (DL) methods like DeepMind's AlphaFold2 (AF2) have led to substantial improvements in protein structure prediction. We analyse confident AF2 models from 21 model organisms using a new classification protocol (CATH-Assign) which exploits novel DL methods for structural comparison and classification. Of ~370,000 confident models, 92% can be assigned to 3253 superfamilies in our CATH domain superfamily classification. The remaining cluster into 2367 putative novel superfamilies. Detailed manual analysis on 618 of these, having at least one human relative, reveal extremely remote homologies and further unusual features. Only 25 novel superfamilies could be confirmed. Although most models map to existing superfamilies, AF2 domains expand CATH by 67% and increases the number of unique 'global' folds by 36% and will provide valuable insights on structure function relationships. CATH-Assign will harness the huge expansion in structural data provided by DeepMind to rationalise evolutionary changes driving functional divergence. Collapse Key Words protein structure predictions protein analysis Collapse MESH Headings Collapse Grants Collapse
45	Protein structure prediction has reached the single-structure frontier. Nat Methods 2023;20:170-173. [PMID: 36639584 PMCID: PMC9839224 DOI: 10.1038/s41592-022-01760-4] [Citation(s) in RCA: 27] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Abstract Collapse Key Words protein structure predictions structure determination Collapse MESH Headings Proteins Protein Conformation Collapse Grants Helmholtz Association Collapse
46	AlphaFill: enriching AlphaFold models with ligands and cofactors. Nat Methods 2023;20:205-213. [PMID: 36424442 PMCID: PMC9911346 DOI: 10.1038/s41592-022-01685-y] [Citation(s) in RCA: 121] [Impact Index Per Article: 121.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Accepted: 10/18/2022] [Indexed: 11/27/2022] Abstract Artificial intelligence-based protein structure prediction approaches have had a transformative effect on biomolecular sciences. The predicted protein models in the AlphaFold protein structure database, however, all lack coordinates for small molecules, essential for molecular structure or function: hemoglobin lacks bound heme; zinc-finger motifs lack zinc ions essential for structural integrity and metalloproteases lack metal ions needed for catalysis. Ligands important for biological function are absent too; no ADP or ATP is bound to any of the ATPases or kinases. Here we present AlphaFill, an algorithm that uses sequence and structure similarity to 'transplant' such 'missing' small molecules and ions from experimentally determined structures to predicted protein models. The algorithm was successfully validated against experimental structures. A total of 12,029,789 transplants were performed on 995,411 AlphaFold models and are available together with associated validation metrics in the alphafill.eu databank, a resource to help scientists make new hypotheses and design targeted experiments. Collapse Key Words protein databases protein structure predictions Collapse MESH Headings Collapse Grants Collapse
47	Convolutional networks for supervised mining of molecular patterns within cellular context. Nat Methods 2023;20:284-294. [PMID: 36690741 PMCID: PMC9911354 DOI: 10.1038/s41592-022-01746-2] [Citation(s) in RCA: 19] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Accepted: 12/02/2022] [Indexed: 01/24/2023] Abstract Cryo-electron tomograms capture a wealth of structural information on the molecular constituents of cells and tissues. We present DeePiCt (deep picker in context), an open-source deep-learning framework for supervised segmentation and macromolecular complex localization in cryo-electron tomography. To train and benchmark DeePiCt on experimental data, we comprehensively annotated 20 tomograms of Schizosaccharomyces pombe for ribosomes, fatty acid synthases, membranes, nuclear pore complexes, organelles, and cytosol. By comparing DeePiCt to state-of-the-art approaches on this dataset, we show its unique ability to identify low-abundance and low-density complexes. We use DeePiCt to study compositionally distinct subpopulations of cellular ribosomes, with emphasis on their contextual association with mitochondria and the endoplasmic reticulum. Finally, applying pre-trained networks to a HeLa cell tomogram demonstrates that DeePiCt achieves high-quality predictions in unseen datasets from different biological species in a matter of minutes. The comprehensively annotated experimental data and pre-trained networks are provided for immediate use by the community. Collapse Key Words protein structure predictions molecular imaging data mining image processing Collapse MESH Headings Collapse Grants Collapse
48	Protein complex prediction using Rosetta, AlphaFold, and mass spectrometry covalent labeling. Nat Commun 2022;13:7846. [PMID: 36543826 PMCID: PMC9772387 DOI: 10.1038/s41467-022-35593-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Accepted: 12/09/2022] [Indexed: 12/24/2022] Open Abstract Covalent labeling (CL) in combination with mass spectrometry can be used as an analytical tool to study and determine structural properties of protein-protein complexes. However, data from these experiments is sparse and does not unambiguously elucidate protein structure. Thus, computational algorithms are needed to deduce structure from the CL data. In this work, we present a hybrid method that combines models of protein complex subunits generated with AlphaFold with differential CL data via a CL-guided protein-protein docking in Rosetta. In a benchmark set, the RMSD (root-mean-square deviation) of the best-scoring models was below 3.6 Å for 5/5 complexes with inclusion of CL data, whereas the same quality was only achieved for 1/5 complexes without CL data. This study suggests that our integrated approach can successfully use data obtained from CL experiments to distinguish between nativelike and non-nativelike models. Collapse Key Words protein structure predictions computational biophysics mass spectrometry Collapse MESH Headings Protein Conformation Proteins/chemistry Algorithms Mass Spectrometry Collapse Grants P41 GM128577 NIGMS NIH HHS U.S. Department of Health & Human Services \| NIH \| National Institute of General Medical Sciences (NIGMS) Alfred P. Sloan Foundation Collapse
49	Proteome-wide 3D structure prediction provides insights into the ancestral metabolism of ancient archaea and bacteria. Nat Commun 2022;13:7861. [PMID: 36543797 PMCID: PMC9772386 DOI: 10.1038/s41467-022-35523-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 12/07/2022] [Indexed: 12/24/2022] Open Abstract Ancestral metabolism has remained controversial due to a lack of evidence beyond sequence-based reconstructions. Although prebiotic chemists have provided hints that metabolism might originate from non-enzymatic protometabolic pathways, gaps between ancestral reconstruction and prebiotic processes mean there is much that is still unknown. Here, we apply proteome-wide 3D structure predictions and comparisons to investigate ancestorial metabolism of ancient bacteria and archaea, to provide information beyond sequence as a bridge to the prebiotic processes. We compare representative bacterial and archaeal strains, which reveal surprisingly similar physiological and metabolic characteristics via microbiological and biophysical experiments. Pairwise comparison of protein structures identify the conserved metabolic modules in bacteria and archaea, despite interference from overly variable sequences. The conserved modules (for example, middle of glycolysis, partial TCA, proton/sulfur respiration, building block biosynthesis) constitute the basic functions that possibly existed in the archaeal-bacterial common ancestor, which are remarkably consistent with the experimentally confirmed protometabolic pathways. These structure-based findings provide a new perspective to reconstructing the ancestral metabolism and understanding its origin, which suggests high-throughput protein 3D structure prediction is a promising approach, deserving broader application in future ancestral exploration. Collapse Key Words protein structure predictions archaeal evolution bacterial evolution Collapse MESH Headings Collapse Grants Collapse
50	Pan-kinome of Legionella expanded by a bioinformatics survey. Sci Rep 2022;12:21782. [PMID: 36526881 PMCID: PMC9758233 DOI: 10.1038/s41598-022-26109-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Accepted: 12/09/2022] [Indexed: 12/23/2022] Open Abstract The pathogenic Legionella bacteria are notorious for delivering numerous effector proteins into the host cell with the aim of disturbing and hijacking cellular processes for their benefit. Despite intensive studies, many effectors remain uncharacterized. Motivated by the richness of Legionella effector repertoires and their oftentimes atypical biochemistry, also by several known atypical Legionella effector kinases and pseudokinases discovered recently, we undertook an in silico survey and exploration of the pan-kinome of the Legionella genus, i.e., the union of the kinomes of individual species. In this study, we discovered 13 novel (pseudo)kinase families (all are potential effectors) with the use of non-standard bioinformatic approaches. Together with 16 known families, we present a catalog of effector and non-effector protein kinase-like families within Legionella, available at http://bioinfo.sggw.edu.pl/kintaro/ . We analyze and discuss the likely functional roles of the novel predicted kinases. Notably, some of the kinase families are also present in other bacterial taxa, including other pathogens, often phylogenetically very distant from Legionella. This work highlights Nature's ingeniousness in the pathogen-host arms race and offers a useful resource for the study of infection mechanisms. Collapse Key Words kinases biochemistry microbiology pathogens protein function predictions protein structure predictions computational biology and bioinformatics protein sequence analyses Collapse MESH Headings Collapse Grants Collapse