151
|
Hon-Nami K, Hijikata A, Yura K, Bessho Y. Whole genome analyses for c-type cytochromes associated with respiratory chains in the extreme thermophile, Thermus thermophilus. J GEN APPL MICROBIOL 2023; 69:68-78. [PMID: 37394433 DOI: 10.2323/jgam.2023.06.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
In thermophilic microorganisms, c-type cytochrome (cyt) proteins mainly function in the respiratory chain as electron carriers. Genome analyses at the beginning of this century revealed a variety of genes harboring the heme c motif. Here, we describe the results of surveying genes with the heme c motif, CxxCH, in a genome database comprising four strains of Thermus thermophilus, including strain HB8, and the confirmation of 19 c-type cytochromes among 27 selected genes. We analyzed the 19 genes, including the expression of four, by a bioinformatics approach to elucidate their individual attributes. One of the approaches included an analysis based on the secondary structure alignment pattern between the heme c motif and the 6th ligand. The predicted structures revealed many cyt c domains with fewer β-strands, such as mitochondrial cyt c, in addition to the β-strand unique to Thermus inserted in cyt c domains, as in T. thermophilus cyt c552 and caa3 cyt c oxidase subunit IIc. The surveyed thermophiles harbor potential proteins with a variety of cyt c folds. The gene analyses led to the development of an index for the classification of cyt c domains. Based on these results, we propose names for T. thermophilus genes harboring the cyt c fold.
Collapse
Affiliation(s)
| | - Atsushi Hijikata
- School of Life Sciences, Tokyo University of Pharmacy and Life Sciences
| | - Kei Yura
- Graduate School of Humanities and Sciences, Ochanomizu University
- Center for Interdisciplinary AI and Data Science, Ochanomizu University
- Graduate School of Advanced Science and Engineering, Waseda University
| | - Yoshitaka Bessho
- Center for Interdisciplinary AI and Data Science, Ochanomizu University
- RIKEN SPring-8 Center, Harima Institute
| |
Collapse
|
152
|
Falkenberg F, Kohn S, Bott M, Bongaerts J, Siegert P. Biochemical characterisation of a novel broad pH spectrum subtilisin from Fictibacillus arsenicus DSM 15822 T. FEBS Open Bio 2023; 13:2035-2046. [PMID: 37649135 PMCID: PMC10626276 DOI: 10.1002/2211-5463.13701] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 08/23/2023] [Accepted: 08/29/2023] [Indexed: 09/01/2023] Open
Abstract
Subtilisins from microbial sources, especially from the Bacillaceae family, are of particular interest for biotechnological applications and serve the currently growing enzyme market as efficient and novel biocatalysts. Biotechnological applications include use in detergents, cosmetics, leather processing, wastewater treatment and pharmaceuticals. To identify a possible candidate for the enzyme market, here we cloned the gene of the subtilisin SPFA from Fictibacillus arsenicus DSM 15822T (obtained through a data mining-based search) and expressed it in Bacillus subtilis DB104. After production and purification, the protease showed a molecular mass of 27.57 kDa and a pI of 5.8. SPFA displayed hydrolytic activity at a temperature optimum of 80 °C and a very broad pH optimum between 8.5 and 11.5, with high activity up to pH 12.5. SPFA displayed no NaCl dependence but a high NaCl tolerance, with decreasing activity up to concentrations of 5 m NaCl. The stability enhanced with increasing NaCl concentration. Based on its substrate preference for 10 synthetic peptide 4-nitroanilide substrates with three or four amino acids and its phylogenetic classification, SPFA can be assigned to the subgroup of true subtilisins. Moreover, SPFA exhibited high tolerance to 5% (w/v) SDS and 5% H2 O2 (v/v). The biochemical properties of SPFA, especially its tolerance of remarkably high pH, SDS and H2 O2 , suggest it has potential for biotechnological applications.
Collapse
Affiliation(s)
- Fabian Falkenberg
- Institute of Nano‐ and BiotechnologiesAachen University of Applied SciencesJülichGermany
| | - Sophie Kohn
- Institute of Nano‐ and BiotechnologiesAachen University of Applied SciencesJülichGermany
| | - Michael Bott
- Institute of Bio‐ and Geosciences, IBG‐1: BiotechnologyForschungszentrum JülichGermany
| | - Johannes Bongaerts
- Institute of Nano‐ and BiotechnologiesAachen University of Applied SciencesJülichGermany
| | - Petra Siegert
- Institute of Nano‐ and BiotechnologiesAachen University of Applied SciencesJülichGermany
| |
Collapse
|
153
|
Bale A, Rambo R, Prior C. The SKMT Algorithm: A method for assessing and comparing underlying protein entanglement. PLoS Comput Biol 2023; 19:e1011248. [PMID: 38011290 PMCID: PMC10703313 DOI: 10.1371/journal.pcbi.1011248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Revised: 12/07/2023] [Accepted: 11/06/2023] [Indexed: 11/29/2023] Open
Abstract
We present fast and simple-to-implement measures of the entanglement of protein tertiary structures which are appropriate for highly flexible structure comparison. These are performed using the SKMT algorithm, a novel method of smoothing the Cα backbone to achieve a minimal complexity curve representation of the manner in which the protein's secondary structure elements fold to form its tertiary structure. Its subsequent complexity is characterised using measures based on the writhe and crossing number quantities heavily utilised in DNA topology studies, and which have shown promising results when applied to proteins recently. The SKMT smoothing is used to derive empirical bounds on a protein's entanglement relative to its number of secondary structure elements. We show that large scale helical geometries dominantly account for the maximum growth in entanglement of protein monomers, and further that this large scale helical geometry is present in a large array of proteins, consistent across a number of different protein structure types and sequences. We also show how these bounds can be used to constrain the search space of protein structure prediction from small angle x-ray scattering experiments, a method highly suited to determining the likely structure of proteins in solution where crystal structure or machine learning based predictions often fail to match experimental data. Finally we develop a structural comparison metric based on the SKMT smoothing which is used in one specific case to demonstrate significant structural similarity between Rossmann fold and TIM Barrel proteins, a link which is potentially significant as attempts to engineer the latter have in the past produced the former. We provide the SWRITHE interactive python notebook to calculate these metrics.
Collapse
Affiliation(s)
- Arron Bale
- Department of Mathematical Sciences, Durham University, Durham, United Kingdom
| | - Robert Rambo
- Diamond Light Source, Harwell Science and Innovation Campus, Didcot, United Kingdom
| | - Christopher Prior
- Department of Mathematical Sciences, Durham University, Durham, United Kingdom
| |
Collapse
|
154
|
Madaloz TZ, Dos Santos K, Zacchi FL, Bainy ACD, Razzera G. Nuclear receptor superfamily structural diversity in pacific oyster: In silico identification of estradiol binding candidates. CHEMOSPHERE 2023; 340:139877. [PMID: 37619748 DOI: 10.1016/j.chemosphere.2023.139877] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Revised: 07/21/2023] [Accepted: 08/17/2023] [Indexed: 08/26/2023]
Abstract
The increasing presence of anthropogenic contaminants in aquatic environments poses challenges for species inhabiting contaminated sites. Due to their structural binding characteristics to ligands that inhibit or activate gene transcription, these xenobiotic compounds frequently target the nuclear receptor superfamily. The present work aims to understand the potential interaction between the hormone 17-β-estradiol, an environmental contaminant, and the nuclear receptors of Crassostrea gigas, the Pacific oyster. This filter-feeding, sessile oyster species is subject to environmental changes and exposure to contaminants. In the Pacific oyster, the estrogen-binding nuclear receptor is not able to bind this hormone as it does in vertebrates. However, another receptor may exhibit responsiveness to estrogen-like molecules and derivatives. We employed high-performance in silico methodologies, including three-dimensional modeling, molecular docking and atomistic molecular dynamics to identify likely binding candidates with the target moecule. Our approach revealed that among the C. gigas nuclear receptor superfamily, candidates with the most favorable interaction with the molecule of interest belonged to the NR1D, NR1H, NR1P, NR2E, NHR42, and NR0B groups. Interestingly, NR1H and NR0B were associated with planktonic/larval life cycle stages, while NR1P, NR2E, and NR0B were associated with sessile/adult life stages. The application of this computational methodological strategy demonstrated high performance in the virtual screening of candidates for binding with the target xenobiotic molecule and can be employed in other studies in the field of ecotoxicology in non-model organisms.
Collapse
Affiliation(s)
- Tâmela Zamboni Madaloz
- Programa de Pós-Graduação Em Bioquímica, Departamento de Bioquímica, Universidade Federal de Santa Catarina, Florianópolis, SC, 88040-900, Brazil; Laboratório de Biomarcadores de Contaminação Aquática e Imunoquímica, Universidade Federal de Santa Catarina, Florianópolis, SC, 88040-900, Brazil
| | - Karin Dos Santos
- Programa de Pós-Graduação Em Bioquímica, Departamento de Bioquímica, Universidade Federal de Santa Catarina, Florianópolis, SC, 88040-900, Brazil; Laboratório de Biomarcadores de Contaminação Aquática e Imunoquímica, Universidade Federal de Santa Catarina, Florianópolis, SC, 88040-900, Brazil
| | - Flávia Lucena Zacchi
- Laboratório de Moluscos Marinhos, Universidade Federal de Santa Catarina, Florianópolis, SC, 88061-600, Brazil
| | - Afonso Celso Dias Bainy
- Programa de Pós-Graduação Em Bioquímica, Departamento de Bioquímica, Universidade Federal de Santa Catarina, Florianópolis, SC, 88040-900, Brazil; Laboratório de Biomarcadores de Contaminação Aquática e Imunoquímica, Universidade Federal de Santa Catarina, Florianópolis, SC, 88040-900, Brazil
| | - Guilherme Razzera
- Programa de Pós-Graduação Em Bioquímica, Departamento de Bioquímica, Universidade Federal de Santa Catarina, Florianópolis, SC, 88040-900, Brazil; Laboratório de Biomarcadores de Contaminação Aquática e Imunoquímica, Universidade Federal de Santa Catarina, Florianópolis, SC, 88040-900, Brazil.
| |
Collapse
|
155
|
Malbranke C, Rostain W, Depardieu F, Cocco S, Monasson R, Bikard D. Computational design of novel Cas9 PAM-interacting domains using evolution-based modelling and structural quality assessment. PLoS Comput Biol 2023; 19:e1011621. [PMID: 37976326 PMCID: PMC10729993 DOI: 10.1371/journal.pcbi.1011621] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Revised: 12/19/2023] [Accepted: 10/19/2023] [Indexed: 11/19/2023] Open
Abstract
We present here an approach to protein design that combines (i) scarce functional information such as experimental data (ii) evolutionary information learned from a natural sequence variants and (iii) physics-grounded modeling. Using a Restricted Boltzmann Machine (RBM), we learn a sequence model of a protein family. We use semi-supervision to leverage available functional information during the RBM training. We then propose a strategy to explore the protein representation space that can be informed by external models such as an empirical force-field method (FoldX). Our approach is applied to a domain of the Cas9 protein responsible for recognition of a short DNA motif. We experimentally assess the functionality of 71 variants generated to explore a range of RBM and FoldX energies. Sequences with as many as 50 differences (20% of the protein domain) to the wild-type retained functionality. Overall, 21/71 sequences designed with our method were functional. Interestingly, 6/71 sequences showed an improved activity in comparison with the original wild-type protein sequence. These results demonstrate the interest in further exploring the synergies between machine-learning of protein sequence representations and physics grounded modeling strategies informed by structural information.
Collapse
Affiliation(s)
- Cyril Malbranke
- Laboratory of Physics of the Ecole Normale Superieure, PSL Research, CNRS UMR 8023, Sorbonne Université, Paris, France
- Institut Pasteur, Université Paris Cité, CNRS UMR 6047, Synthetic Biology, Paris, France
| | - William Rostain
- Institut Pasteur, Université Paris Cité, CNRS UMR 6047, Synthetic Biology, Paris, France
| | - Florence Depardieu
- Institut Pasteur, Université Paris Cité, CNRS UMR 6047, Synthetic Biology, Paris, France
| | - Simona Cocco
- Laboratory of Physics of the Ecole Normale Superieure, PSL Research, CNRS UMR 8023, Sorbonne Université, Paris, France
| | - Rémi Monasson
- Laboratory of Physics of the Ecole Normale Superieure, PSL Research, CNRS UMR 8023, Sorbonne Université, Paris, France
| | - David Bikard
- Institut Pasteur, Université Paris Cité, CNRS UMR 6047, Synthetic Biology, Paris, France
| |
Collapse
|
156
|
Bastolla U, Abia D, Piette O. PC_ali: a tool for improved multiple alignments and evolutionary inference based on a hybrid protein sequence and structure similarity score. Bioinformatics 2023; 39:btad630. [PMID: 37847775 PMCID: PMC10628387 DOI: 10.1093/bioinformatics/btad630] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Revised: 08/01/2023] [Accepted: 10/17/2023] [Indexed: 10/19/2023] Open
Abstract
MOTIVATION Evolutionary inference depends crucially on the quality of multiple sequence alignments (MSA), which is problematic for distantly related proteins. Since protein structure is more conserved than sequence, it seems natural to use structure alignments for distant homologs. However, structure alignments may not be suitable for inferring evolutionary relationships. RESULTS Here we examined four protein similarity measures that depend on sequence and structure (fraction of aligned residues, sequence identity, fraction of superimposed residues, and contact overlap), finding that they are intimately correlated but none of them provides a complete and unbiased picture of conservation in proteins. Therefore, we propose the new hybrid protein sequence and structure similarity score PC_sim based on their main principal component. The corresponding divergence measure PC_div shows the strongest correlation with divergences obtained from individual similarities, suggesting that it infers accurate evolutionary divergences. We developed the program PC_ali that constructs protein MSAs either de novo or modifying an input MSA, using a similarity matrix based on PC_sim. The program constructs a starting MSA based on the maximal cliques of the graph of these PAs and it refines it through progressive alignments along the tree reconstructed with PC_div. Compared with eight state-of-the-art multiple structure or sequence alignment tools, PC_ali achieves higher or equal aligned fraction and structural scores, sequence identity higher than structure aligners although lower than sequence aligners, highest score PC_sim, and highest similarity with the MSAs produced by other tools and with the reference MSA Balibase. AVAILABILITY AND IMPLEMENTATION https://github.com/ugobas/PC_ali.
Collapse
Affiliation(s)
- Ugo Bastolla
- Centro de Biologia Molecular “Severo Ochoa” (CBMSO), CSIC-UAM Cantoblanco, 28049 Madrid, Spain
| | - David Abia
- Bioinformatics Facility CBMSO, CSIC-UAM Cantoblanco, 28049 Madrid, Spain
| | - Oscar Piette
- Centro de Biologia Molecular “Severo Ochoa” (CBMSO), CSIC-UAM Cantoblanco, 28049 Madrid, Spain
| |
Collapse
|
157
|
Genz LR, Mulvaney T, Nair S, Topf M. PICKLUSTER: a protein-interface clustering and analysis plug-in for UCSF ChimeraX. Bioinformatics 2023; 39:btad629. [PMID: 37846034 PMCID: PMC10629935 DOI: 10.1093/bioinformatics/btad629] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 09/07/2023] [Accepted: 10/13/2023] [Indexed: 10/18/2023] Open
Abstract
SUMMARY The identification and characterization of interfaces in protein complexes is crucial for understanding the mechanisms of molecular recognition. These interfaces are also attractive targets for protein inhibition. However, targeting protein interfaces can be challenging for large interfaces that consist of multiple interacting regions. We present PICKLUSTER [Protein Interface C(K)luster]-a program for identifying "sub-interfaces" in protein-protein complexes using distance clustering. The division of the interface into smaller "sub-interfaces" offers a more focused approach for targeting protein-protein interfaces. AVAILABILITY AND IMPLEMENTATION PICKLUSTER is implemented as a plug-in for the molecular visualization program UCSF ChimeraX 1.4 and subsequent versions. It is freely available for download in the ChimeraX Toolshed and https://gitlab.com/topf-lab/pickluster.git.
Collapse
Affiliation(s)
- Luca R Genz
- Leibniz-Institut für Virologie (LIV), 20251 Hamburg, Germany
- Centre for Structural Systems Biology (CSSB), 22607 Hamburg, Germany
| | - Thomas Mulvaney
- Leibniz-Institut für Virologie (LIV), 20251 Hamburg, Germany
- Centre for Structural Systems Biology (CSSB), 22607 Hamburg, Germany
- Universitätsklinikum Hamburg Eppendorf (UKE), 20246 Hamburg, Germany
| | - Sanjana Nair
- Leibniz-Institut für Virologie (LIV), 20251 Hamburg, Germany
- Centre for Structural Systems Biology (CSSB), 22607 Hamburg, Germany
| | - Maya Topf
- Leibniz-Institut für Virologie (LIV), 20251 Hamburg, Germany
- Centre for Structural Systems Biology (CSSB), 22607 Hamburg, Germany
- Universitätsklinikum Hamburg Eppendorf (UKE), 20246 Hamburg, Germany
| |
Collapse
|
158
|
Kilian M, Bischofs IB. Co-evolution at protein-protein interfaces guides inference of stoichiometry of oligomeric protein complexes by de novo structure prediction. Mol Microbiol 2023; 120:763-782. [PMID: 37777474 DOI: 10.1111/mmi.15169] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 09/10/2023] [Accepted: 09/11/2023] [Indexed: 10/02/2023]
Abstract
The quaternary structure with specific stoichiometry is pivotal to the specific function of protein complexes. However, determining the structure of many protein complexes experimentally remains a major bottleneck. Structural bioinformatics approaches, such as the deep learning algorithm Alphafold2-multimer (AF2-multimer), leverage the co-evolution of amino acids and sequence-structure relationships for accurate de novo structure and contact prediction. Pseudo-likelihood maximization direct coupling analysis (plmDCA) has been used to detect co-evolving residue pairs by statistical modeling. Here, we provide evidence that combining both methods can be used for de novo prediction of the quaternary structure and stoichiometry of a protein complex. We achieve this by augmenting the existing AF2-multimer confidence metrics with an interpretable score to identify the complex with an optimal fraction of native contacts of co-evolving residue pairs at intermolecular interfaces. We use this strategy to predict the quaternary structure and non-trivial stoichiometries of Bacillus subtilis spore germination protein complexes with unknown structures. Co-evolution at intermolecular interfaces may therefore synergize with AI-based de novo quaternary structure prediction of structurally uncharacterized bacterial protein complexes.
Collapse
Affiliation(s)
- Max Kilian
- Max-Planck-Institute for Terrestrial Microbiology, Marburg, Germany
- BioQuant Center for Quantitative Analysis of Molecular and Cellular Biosystems, Heidelberg University, Heidelberg, Germany
- Center for Molecular Biology of Heidelberg University (ZMBH), Heidelberg, Germany
| | - Ilka B Bischofs
- Max-Planck-Institute for Terrestrial Microbiology, Marburg, Germany
- BioQuant Center for Quantitative Analysis of Molecular and Cellular Biosystems, Heidelberg University, Heidelberg, Germany
- Center for Molecular Biology of Heidelberg University (ZMBH), Heidelberg, Germany
| |
Collapse
|
159
|
Mahlich Y, Zhu C, Chung H, Velaga PK, De Paolis Kaluza M, Radivojac P, Friedberg I, Bromberg Y. Learning from the unknown: exploring the range of bacterial functionality. Nucleic Acids Res 2023; 51:10162-10175. [PMID: 37739408 PMCID: PMC10602916 DOI: 10.1093/nar/gkad757] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Accepted: 09/11/2023] [Indexed: 09/24/2023] Open
Abstract
Determining the repertoire of a microbe's molecular functions is a central question in microbial biology. Modern techniques achieve this goal by comparing microbial genetic material against reference databases of functionally annotated genes/proteins or known taxonomic markers such as 16S rRNA. Here, we describe a novel approach to exploring bacterial functional repertoires without reference databases. Our Fusion scheme establishes functional relationships between bacteria and assigns organisms to Fusion-taxa that differ from otherwise defined taxonomic clades. Three key findings of our work stand out. First, bacterial functional comparisons outperform marker genes in assigning taxonomic clades. Fusion profiles are also better for this task than other functional annotation schemes. Second, Fusion-taxa are robust to addition of novel organisms and are, arguably, able to capture the environment-driven bacterial diversity. Finally, our alignment-free nucleic acid-based Siamese Neural Network model, created using Fusion functions, enables finding shared functionality of very distant, possibly structurally different, microbial homologs. Our work can thus help annotate functional repertoires of bacterial organisms and further guide our understanding of microbial communities.
Collapse
Affiliation(s)
- Yannick Mahlich
- Department of Biochemistry and Microbiology, Rutgers University, 76 Lipman Dr, New Brunswick, NJ 08873, USA
| | - Chengsheng Zhu
- Department of Biochemistry and Microbiology, Rutgers University, 76 Lipman Dr, New Brunswick, NJ 08873, USA
- Xbiome Inc., 1 Broadway, 14th fl, Cambridge, MA 02142, USA
| | - Henri Chung
- Department of Veterinary Microbiology and Preventive Medicine, Iowa State University, Ames, IA 50011, USA
- Interdepartmental program in Bioinformatics and Computational Biology, Iowa State University, Ames, IA 50011, USA
| | - Pavan K Velaga
- Department of Biochemistry and Microbiology, Rutgers University, 76 Lipman Dr, New Brunswick, NJ 08873, USA
| | - M Clara De Paolis Kaluza
- Khoury College of Computer Sciences, Northeastern University, 177 Huntington Avenue, Boston, MA 02115, USA
| | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, 177 Huntington Avenue, Boston, MA 02115, USA
| | - Iddo Friedberg
- Department of Veterinary Microbiology and Preventive Medicine, Iowa State University, Ames, IA 50011, USA
- Interdepartmental program in Bioinformatics and Computational Biology, Iowa State University, Ames, IA 50011, USA
| | - Yana Bromberg
- Department of Biochemistry and Microbiology, Rutgers University, 76 Lipman Dr, New Brunswick, NJ 08873, USA
- Department of Biology, Emory University, 1510 Clifton Road NE, Atlanta, GA 30322, USA
- Department of Computer Science, Emory University, 400 Dowman Drive, Atlanta, GA 30322, USA
| |
Collapse
|
160
|
Varadi M, Tsenkov M, Velankar S. Challenges in bridging the gap between protein structure prediction and functional interpretation. Proteins 2023. [PMID: 37850517 DOI: 10.1002/prot.26614] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 09/26/2023] [Accepted: 10/04/2023] [Indexed: 10/19/2023]
Abstract
The rapid evolution of protein structure prediction tools has significantly broadened access to protein structural data. Although predicted structure models have the potential to accelerate and impact fundamental and translational research significantly, it is essential to note that they are not validated and cannot be considered the ground truth. Thus, challenges persist, particularly in capturing protein dynamics, predicting multi-chain structures, interpreting protein function, and assessing model quality. Interdisciplinary collaborations are crucial to overcoming these obstacles. Databases like the AlphaFold Protein Structure Database, the ESM Metagenomic Atlas, and initiatives like the 3D-Beacons Network provide FAIR access to these data, enabling their interpretation and application across a broader scientific community. Whilst substantial advancements have been made in protein structure prediction, further progress is required to address the remaining challenges. Developing training materials, nurturing collaborations, and ensuring open data sharing will be paramount in this pursuit. The continued evolution of these tools and methodologies will deepen our understanding of protein function and accelerate disease pathogenesis and drug development discoveries.
Collapse
Affiliation(s)
- Mihaly Varadi
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Maxim Tsenkov
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Sameer Velankar
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, UK
| |
Collapse
|
161
|
Rudolph A, Nyerges A, Chiappino-Pepe A, Landon M, Baas-Thomas M, Church G. Strategies to identify and edit improvements in synthetic genome segments episomally. Nucleic Acids Res 2023; 51:10094-10106. [PMID: 37615546 PMCID: PMC10570025 DOI: 10.1093/nar/gkad692] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Revised: 06/30/2023] [Accepted: 08/16/2023] [Indexed: 08/25/2023] Open
Abstract
Genome engineering projects often utilize bacterial artificial chromosomes (BACs) to carry multi-kilobase DNA segments at low copy number. However, all stages of whole-genome engineering have the potential to impose mutations on the synthetic genome that can reduce or eliminate the fitness of the final strain. Here, we describe improvements to a multiplex automated genome engineering (MAGE) protocol to improve recombineering frequency and multiplexability. This protocol was applied to recoding an Escherichia coli strain to replace seven codons with synonymous alternatives genome wide. Ten 44 402-47 179 bp de novo synthesized DNA segments contained in a BAC from the recoded strain were unable to complement deletion of the corresponding 33-61 wild-type genes using a single antibiotic resistance marker. Next-generation sequencing (NGS) was used to identify 1-7 non-recoding mutations in essential genes per segment, and MAGE in turn proved a useful strategy to repair these mutations on the recoded segment contained in the BAC when both the recoded and wild-type copies of the mutated genes had to exist by necessity during the repair process. Finally, two web-based tools were used to predict the impact of a subset of non-recoding missense mutations on strain fitness using protein structure and function calls.
Collapse
Affiliation(s)
- Alexandra Rudolph
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - Akos Nyerges
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - Anush Chiappino-Pepe
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
- Wyss Institute for Biologically Inspired Engineering, Boston, MA 02115, USA
| | - Matthieu Landon
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | | | - George Church
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
- Wyss Institute for Biologically Inspired Engineering, Boston, MA 02115, USA
| |
Collapse
|
162
|
Mahmud S, Morehead A, Cheng J. Accurate prediction of protein tertiary structural changes induced by single-site mutations with equivariant graph neural networks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.03.560758. [PMID: 37873289 PMCID: PMC10592624 DOI: 10.1101/2023.10.03.560758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
Predicting the change of protein tertiary structure caused by singlesite mutations is important for studying protein structure, function, and interaction. Even though computational protein structure prediction methods such as AlphaFold can predict the overall tertiary structures of most proteins rather accurately, they are not sensitive enough to accurately predict the structural changes induced by single-site amino acid mutations on proteins. Specialized mutation prediction methods mostly focus on predicting the overall stability or function changes caused by mutations without attempting to predict the exact mutation-induced structural changes, limiting their use in protein mutation study. In this work, we develop the first deep learning method based on equivariant graph neural networks (EGNN) to directly predict the tertiary structural changes caused by single-site mutations and the tertiary structure of any protein mutant from the structure of its wild-type counterpart. The results show that it performs substantially better in predicting the tertiary structures of protein mutants than the widely used protein structure prediction method AlphaFold.
Collapse
|
163
|
Ru J, Xue J, Sun J, Cova L, Deng L. Unveiling the hidden role of aquatic viruses in hydrocarbon pollution bioremediation. JOURNAL OF HAZARDOUS MATERIALS 2023; 459:132299. [PMID: 37597386 DOI: 10.1016/j.jhazmat.2023.132299] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 07/28/2023] [Accepted: 08/12/2023] [Indexed: 08/21/2023]
Abstract
Hydrocarbon pollution poses substantial environmental risks to water and soil. Bioremediation, which utilizes microorganisms to manage pollutants, offers a cost-effective solution. However, the role of viruses, particularly bacteriophages (phages), in bioremediation remains unexplored. This study examines the diversity and activity of hydrocarbon-degradation genes encoded by environmental viruses, focusing on phages, within public databases. We identified 57 high-quality phage-encoded auxiliary metabolic genes (AMGs) related to hydrocarbon degradation, which we refer to as virus-encoded hydrocarbon degradation genes (vHYDEGs). These genes are encoded by taxonomically diverse aquatic phages and highlight the under-characterized global virosphere. Six protein families involved in the initial alkane hydroxylation steps were identified. Phylogenetic analyses revealed the diverse evolutionary trajectories of vHYDEGs across habitats, revealing previously unknown biodegraders linked evolutionarily with vHYDEGs. Our findings suggest phage AMGs may contribute to alkane and aromatic hydrocarbon degradation, participating in the initial, rate-limiting hydroxylation steps, thereby aiding hydrocarbon pollution bioremediation and promoting their propagation. To support future research, we developed vHyDeg, a database containing identified vHYDEGs with comprehensive annotations, facilitating the screening of hydrocarbon degradation AMGs and encouraging their bioremediation applications.
Collapse
Affiliation(s)
- Jinlong Ru
- Institute of Virology, Helmholtz Centre Munich - German Research Centre for Environmental Health, Neuherberg 85764, Germany; Chair of Prevention for Microbial Infectious Disease, Central Institute of Disease Prevention and School of Life Sciences, Technical University of Munich, Freising 85354, Germany
| | - Jinling Xue
- Institute of Virology, Helmholtz Centre Munich - German Research Centre for Environmental Health, Neuherberg 85764, Germany; Chair of Prevention for Microbial Infectious Disease, Central Institute of Disease Prevention and School of Life Sciences, Technical University of Munich, Freising 85354, Germany
| | - Jianfeng Sun
- Botnar Research Centre, University of Oxford, Oxford OX3 7LD, UK
| | - Linda Cova
- Institute of Virology, Helmholtz Centre Munich - German Research Centre for Environmental Health, Neuherberg 85764, Germany
| | - Li Deng
- Institute of Virology, Helmholtz Centre Munich - German Research Centre for Environmental Health, Neuherberg 85764, Germany; Chair of Prevention for Microbial Infectious Disease, Central Institute of Disease Prevention and School of Life Sciences, Technical University of Munich, Freising 85354, Germany.
| |
Collapse
|
164
|
Lategan FA, Schreiber C, Patterton HG. SeqPredNN: a neural network that generates protein sequences that fold into specified tertiary structures. BMC Bioinformatics 2023; 24:373. [PMID: 37789284 PMCID: PMC10546711 DOI: 10.1186/s12859-023-05498-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Accepted: 09/25/2023] [Indexed: 10/05/2023] Open
Abstract
BACKGROUND The relationship between the sequence of a protein, its structure, and the resulting connection between its structure and function, is a foundational principle in biological science. Only recently has the computational prediction of protein structure based only on protein sequence been addressed effectively by AlphaFold, a neural network approach that can predict the majority of protein structures with X-ray crystallographic accuracy. A question that is now of acute relevance is the "inverse protein folding problem": predicting the sequence of a protein that folds into a specified structure. This will be of immense value in protein engineering and biotechnology, and will allow the design and expression of recombinant proteins that can, for instance, fold into specified structures as a scaffold for the attachment of recombinant antigens, or enzymes with modified or novel catalytic activities. Here we describe the development of SeqPredNN, a feed-forward neural network trained with X-ray crystallographic structures from the RCSB Protein Data Bank to predict the identity of amino acids in a protein structure using only the relative positions, orientations, and backbone dihedral angles of nearby residues. RESULTS We predict the sequence of a protein expected to fold into a specified structure and assess the accuracy of the prediction using both AlphaFold and RoseTTAFold to computationally generate the fold of the derived sequence. We show that the sequences predicted by SeqPredNN fold into a structure with a median TM-score of 0.638 when compared to the crystal structure according to AlphaFold predictions, yet these sequences are unique and only 28.4% identical to the sequence of the crystallized protein. CONCLUSIONS We propose that SeqPredNN will be a valuable tool to generate proteins of defined structure for the design of novel biomaterials, pharmaceuticals, catalysts, and reporter systems. The low sequence identity of its predictions compared to the native sequence could prove useful for developing proteins with modified physical properties, such as water solubility and thermal stability. The speed and ease of use of SeqPredNN offers a significant advantage over physics-based protein design methods.
Collapse
Affiliation(s)
- F Adriaan Lategan
- Center for Bioinformatics and Computational Biology, Stellenbosch University, Stellenbosch, 7600, South Africa
| | - Caroline Schreiber
- Center for Bioinformatics and Computational Biology, Stellenbosch University, Stellenbosch, 7600, South Africa
| | - Hugh G Patterton
- Center for Bioinformatics and Computational Biology, Stellenbosch University, Stellenbosch, 7600, South Africa.
| |
Collapse
|
165
|
Pogozheva ID, Cherepanov S, Park SJ, Raghavan M, Im W, Lomize AL. Structural Modeling of Cytokine-Receptor-JAK2 Signaling Complexes Using AlphaFold Multimer. J Chem Inf Model 2023; 63:5874-5895. [PMID: 37694948 DOI: 10.1021/acs.jcim.3c00926] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/12/2023]
Abstract
Homodimeric class 1 cytokine receptors include the erythropoietin (EPOR), thrombopoietin (TPOR), granulocyte colony-stimulating factor 3 (CSF3R), growth hormone (GHR), and prolactin receptors (PRLR). These cell-surface single-pass transmembrane (TM) glycoproteins regulate cell growth, proliferation, and differentiation and induce oncogenesis. An active TM signaling complex consists of a receptor homodimer, one or two ligands bound to the receptor extracellular domains, and two molecules of Janus Kinase 2 (JAK2) constitutively associated with the receptor intracellular domains. Although crystal structures of soluble extracellular domains with ligands have been obtained for all of the receptors except TPOR, little is known about the structure and dynamics of the complete TM complexes that activate the downstream JAK-STAT signaling pathway. Three-dimensional models of five human receptor complexes with cytokines and JAK2 were generated here by using AlphaFold Multimer. Given the large size of the complexes (from 3220 to 4074 residues), the modeling required a stepwise assembly from smaller parts, with selection and validation of the models through comparisons with published experimental data. The modeling of active and inactive complexes supports a general activation mechanism that involves ligand binding to a monomeric receptor followed by receptor dimerization and rotational movement of the receptor TM α-helices, causing proximity, dimerization, and activation of associated JAK2 subunits. The binding mode of two eltrombopag molecules to the TM α-helices of the active TPOR dimer was proposed. The models also help elucidate the molecular basis of oncogenic mutations that may involve a noncanonical activation route. Models equilibrated in explicit lipids of the plasma membrane are publicly available.
Collapse
Affiliation(s)
- Irina D Pogozheva
- Department of Medicinal Chemistry, College of Pharmacy, University of Michigan, Ann Arbor, Michigan 48109, United States
| | - Stanislav Cherepanov
- Biophysics Program, University of Michigan, Ann Arbor, Michigan 48109, United States
| | - Sang-Jun Park
- Departments of Biological Sciences and Chemistry, Lehigh University, Bethlehem, Pennsylvania 18015, United States
| | - Malini Raghavan
- Department of Microbiology and Immunology, University of Michigan Medical School, Ann Arbor, Michigan 48109, United States
| | - Wonpil Im
- Departments of Biological Sciences and Chemistry, Lehigh University, Bethlehem, Pennsylvania 18015, United States
| | - Andrei L Lomize
- Department of Medicinal Chemistry, College of Pharmacy, University of Michigan, Ann Arbor, Michigan 48109, United States
| |
Collapse
|
166
|
Sarkar D, Lee H, Vant JW, Turilli M, Vermaas JV, Jha S, Singharoy A. Adaptive Ensemble Refinement of Protein Structures in High Resolution Electron Microscopy Density Maps with Radical Augmented Molecular Dynamics Flexible Fitting. J Chem Inf Model 2023; 63:5834-5846. [PMID: 37661856 DOI: 10.1021/acs.jcim.3c00350] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/05/2023]
Abstract
Recent advances in cryo-electron microscopy (cryo-EM) have enabled modeling macromolecular complexes that are essential components of the cellular machinery. The density maps derived from cryo-EM experiments are often integrated with manual, knowledge-driven or artificial intelligence-driven and physics-guided computational methods to build, fit, and refine molecular structures. Going beyond a single stationary-structure determination scheme, it is becoming more common to interpret the experimental data with an ensemble of models that contributes to an average observation. Hence, there is a need to decide on the quality of an ensemble of protein structures on-the-fly while refining them against the density maps. We introduce such an adaptive decision-making scheme during the molecular dynamics flexible fitting (MDFF) of biomolecules. Using RADICAL-Cybertools, the new RADICAL augmented MDFF implementation (R-MDFF) is examined in high-performance computing environments for refinement of two prototypical protein systems, adenylate kinase and carbon monoxide dehydrogenase. For these test cases, use of multiple replicas in flexible fitting with adaptive decision making in R-MDFF improves the overall correlation to the density by 40% relative to the refinements of the brute-force MDFF. The improvements are particularly significant at high, 2-3 Å map resolutions. More importantly, the ensemble model captures key features of biologically relevant molecular dynamics that are inaccessible to a single-model interpretation. Finally, the pipeline is applicable to systems of growing sizes, which is demonstrated using ensemble refinement of capsid proteins from the chimpanzee adenovirus. The overhead for decision making remains low and robust to computing environments. The software is publicly available on GitHub and includes a short user guide to install R-MDFF on different computing environments, from local Linux-based workstations to high-performance computing environments.
Collapse
Affiliation(s)
- Daipayan Sarkar
- MSU-DOE Plant Research Laboratory, East Lansing, Michigan 48824, United States
- School of Molecular Sciences, Arizona State University, Tempe, Arizona 85281, United States
| | - Hyungro Lee
- Pacific Northwest National Laboratory, Richland, Washington 99354, United States
- Electrical & Computer Engineering, Rutgers University, New Brunswick, New Jersey 08854, United States
| | - John W Vant
- School of Molecular Sciences, Arizona State University, Tempe, Arizona 85281, United States
| | - Matteo Turilli
- Electrical & Computer Engineering, Rutgers University, New Brunswick, New Jersey 08854, United States
- Computational Science Initiative, Brookhaven National Laboratory, Upton, New York 11973, United States
| | - Josh V Vermaas
- MSU-DOE Plant Research Laboratory, East Lansing, Michigan 48824, United States
| | - Shantenu Jha
- Electrical & Computer Engineering, Rutgers University, New Brunswick, New Jersey 08854, United States
- Computational Science Initiative, Brookhaven National Laboratory, Upton, New York 11973, United States
| | - Abhishek Singharoy
- School of Molecular Sciences, Arizona State University, Tempe, Arizona 85281, United States
| |
Collapse
|
167
|
Xu G, Luo Z, Zhou R, Wang Q, Ma J. OPUS-Fold3: a gradient-based protein all-atom folding and docking framework on TensorFlow. Brief Bioinform 2023; 24:bbad365. [PMID: 37833840 DOI: 10.1093/bib/bbad365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2023] [Revised: 08/29/2023] [Accepted: 09/20/2023] [Indexed: 10/15/2023] Open
Abstract
For refining and designing protein structures, it is essential to have an efficient protein folding and docking framework that generates a protein 3D structure based on given constraints. In this study, we introduce OPUS-Fold3 as a gradient-based, all-atom protein folding and docking framework, which accurately generates 3D protein structures in compliance with specified constraints, such as a potential function as long as it can be expressed as a function of positions of heavy atoms. Our tests show that, for example, OPUS-Fold3 achieves performance comparable to pyRosetta in backbone folding and significantly better in side-chain modeling. Developed using Python and TensorFlow 2.4, OPUS-Fold3 is user-friendly for any source-code level modifications and can be seamlessly combined with other deep learning models, thus facilitating collaboration between the biology and AI communities. The source code of OPUS-Fold3 can be downloaded from http://github.com/OPUS-MaLab/opus_fold3. It is freely available for academic usage.
Collapse
Affiliation(s)
- Gang Xu
- Multiscale Research Institute of Complex Systems, Fudan University, Shanghai, 200433, China
- Zhangjiang Fudan International Innovation Center, Fudan University, Shanghai, 201210, China
- Shanghai AI Laboratory, Shanghai, 200030, China
| | - Zhenwei Luo
- Multiscale Research Institute of Complex Systems, Fudan University, Shanghai, 200433, China
- Zhangjiang Fudan International Innovation Center, Fudan University, Shanghai, 201210, China
- Shanghai AI Laboratory, Shanghai, 200030, China
| | - Ruhong Zhou
- Institute of Quantitative Biology, College of Life Sciences, Zhejiang University, Hangzhou 310058, China
- Shanghai Institute for Advanced Study, Zhejiang University, Shanghai, 201203, China
| | - Qinghua Wang
- Center for Biomolecular Innovation, Harcam Biomedicines, Shanghai, 200131, China
| | - Jianpeng Ma
- Multiscale Research Institute of Complex Systems, Fudan University, Shanghai, 200433, China
- Zhangjiang Fudan International Innovation Center, Fudan University, Shanghai, 201210, China
- Shanghai AI Laboratory, Shanghai, 200030, China
- Shanghai Institute for Advanced Study, Zhejiang University, Shanghai, 201203, China
| |
Collapse
|
168
|
Liu J, Guo Z, Wu T, Roy RS, Chen C, Cheng J. Improving AlphaFold2-based protein tertiary structure prediction with MULTICOM in CASP15. Commun Chem 2023; 6:188. [PMID: 37679431 PMCID: PMC10484931 DOI: 10.1038/s42004-023-00991-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Accepted: 08/21/2023] [Indexed: 09/09/2023] Open
Abstract
Since the 14th Critical Assessment of Techniques for Protein Structure Prediction (CASP14), AlphaFold2 has become the standard method for protein tertiary structure prediction. One remaining challenge is to further improve its prediction. We developed a new version of the MULTICOM system to sample diverse multiple sequence alignments (MSAs) and structural templates to improve the input for AlphaFold2 to generate structural models. The models are then ranked by both the pairwise model similarity and AlphaFold2 self-reported model quality score. The top ranked models are refined by a novel structure alignment-based refinement method powered by Foldseek. Moreover, for a monomer target that is a subunit of a protein assembly (complex), MULTICOM integrates tertiary and quaternary structure predictions to account for tertiary structural changes induced by protein-protein interaction. The system participated in the tertiary structure prediction in 2022 CASP15 experiment. Our server predictor MULTICOM_refine ranked 3rd among 47 CASP15 server predictors and our human predictor MULTICOM ranked 7th among all 132 human and server predictors. The average GDT-TS score and TM-score of the first structural models that MULTICOM_refine predicted for 94 CASP15 domains are ~0.80 and ~0.92, 9.6% and 8.2% higher than ~0.73 and 0.85 of the standard AlphaFold2 predictor respectively.
Collapse
Affiliation(s)
- Jian Liu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65211, USA
| | - Zhiye Guo
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65211, USA
| | - Tianqi Wu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65211, USA
| | - Raj S Roy
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65211, USA
| | - Chen Chen
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65211, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65211, USA.
| |
Collapse
|
169
|
Wallner B. AFsample: improving multimer prediction with AlphaFold using massive sampling. Bioinformatics 2023; 39:btad573. [PMID: 37713472 PMCID: PMC10534052 DOI: 10.1093/bioinformatics/btad573] [Citation(s) in RCA: 30] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Revised: 05/29/2023] [Accepted: 09/14/2023] [Indexed: 09/17/2023] Open
Abstract
SUMMARY The AlphaFold2 neural network model has revolutionized structural biology with unprecedented performance. We demonstrate that by stochastically perturbing the neural network by enabling dropout at inference combined with massive sampling, it is possible to improve the quality of the generated models. We generated ∼6000 models per target compared with 25 default for AlphaFold-Multimer, with v1 and v2 multimer network models, with and without templates, and increased the number of recycles within the network. The method was benchmarked in CASP15, and compared with AlphaFold-Multimer v2 it improved the average DockQ from 0.41 to 0.55 using identical input and was ranked at the very top in the protein assembly category when compared with all other groups participating in CASP15. The simplicity of the method should facilitate the adaptation by the field, and the method should be useful for anyone interested in modeling multimeric structures, alternate conformations, or flexible structures. AVAILABILITY AND IMPLEMENTATION AFsample is available online at http://wallnerlab.org/AFsample.
Collapse
Affiliation(s)
- Björn Wallner
- Division of Bioinformatics, Department of Physics, Chemistry and Biology, Linköping University, SE-581 83 Linköping, Sweden
| |
Collapse
|
170
|
Herrington NB, Stein D, Li YC, Pandey G, Schlessinger A. Exploring the Druggable Conformational Space of Protein Kinases Using AI-Generated Structures. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.31.555779. [PMID: 37693436 PMCID: PMC10491245 DOI: 10.1101/2023.08.31.555779] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2023]
Abstract
Protein kinase function and interactions with drugs are controlled in part by the movement of the DFG and ɑC-Helix motifs, which enable kinases to adopt various conformational states. Small molecule ligands elicit therapeutic effects with distinct selectivity profiles and residence times that often depend on the kinase conformation(s) they bind. However, the limited availability of experimentally determined structural data for kinases in inactive states restricts drug discovery efforts for this major protein family. Modern AI-based structural modeling methods hold potential for exploring the previously experimentally uncharted druggable conformational space for kinases. Here, we first evaluated the currently explored conformational space of kinases in the PDB and models generated by AlphaFold2 (AF2) (1) and ESMFold (2), two prominent AI-based structure prediction methods. We then investigated AF2's ability to predict kinase structures in different conformations at various multiple sequence alignment (MSA) depths, based on this parameter's ability to explore conformational diversity. Our results showed a bias within the PDB and predicted structural models generated by AF2 and ESMFold toward structures of kinases in the active state over alternative conformations, particularly those conformations controlled by the DFG motif. Finally, we demonstrate that predicting kinase structures using AF2 at lower MSA depths allows the exploration of the space of these alternative conformations, including identifying previously unobserved conformations for 398 kinases. The results of our analysis of structural modeling by AF2 create a new avenue for the pursuit of new therapeutic agents against a notoriously difficult-to-target family of proteins. Significance Statement Greater abundance of kinase structural data in inactive conformations, currently lacking in structural databases, would improve our understanding of how protein kinases function and expand drug discovery and development for this family of therapeutic targets. Modern approaches utilizing artificial intelligence and machine learning have potential for efficiently capturing novel protein conformations. We provide evidence for a bias within AlphaFold2 and ESMFold to predict structures of kinases in their active states, similar to their overrepresentation in the PDB. We show that lowering the AlphaFold2 algorithm's multiple sequence alignment depth can help explore kinase conformational space more broadly. It can also enable the prediction of hundreds of kinase structures in novel conformations, many of whose models are likely viable for drug discovery.
Collapse
|
171
|
Mijit A, Wang X, Li Y, Xu H, Chen Y, Xue W. Mapping synthetic binding proteins epitopes on diverse protein targets by protein structure prediction and protein-protein docking. Comput Biol Med 2023; 163:107183. [PMID: 37352638 DOI: 10.1016/j.compbiomed.2023.107183] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2023] [Revised: 06/12/2023] [Accepted: 06/13/2023] [Indexed: 06/25/2023]
Abstract
Synthetic binding proteins (SBPs) are a class of artificial proteins engineered from privileged protein scaffolds, which can form highly specific molecular recognition interfaces with a variety of targets. Due to the characteristics of small size, high stability, and good tissue permeability, SBPs have important applications in biomedical research, disease diagnosis and treatment. However, knowledge of SBPs epitopes on the structures of target proteins is still limited, which hinder the development of novel SBPs. In this study, based on the currently available information of SBPs and their targets, 96 pairs of interacting proteins referring to 96 representative SBPs and 80 different targets, were systemically investigated using the state-of-the-art computational modeling techniques including AlphaFold2 protein structure prediction and Rosetta protein-protein docking. As a result, 71 out of the 96 pairs were successfully docked, of which 18, 33, and 20 pairs were defined as models with high, medium, and acceptable quality, respectively. In addition, the interface information was analyzed to decipher the interaction types driven SBPs and targets recognition. Overall, this work not only provides important structural information for understanding the mechanism of action of other SBPs with same protein scaffold, but also for aiding the rational protein engineering and to design of novel SBPs with biomedical applications.
Collapse
Affiliation(s)
- Arzu Mijit
- Chongqing Key Laboratory of Natural Product Synthesis and Drug Research, School of Pharmaceutical Sciences, Chongqing University, Chongqing, 401331, China
| | - Xiaona Wang
- Chongqing Key Laboratory of Natural Product Synthesis and Drug Research, School of Pharmaceutical Sciences, Chongqing University, Chongqing, 401331, China
| | - Yanlin Li
- Chongqing Key Laboratory of Natural Product Synthesis and Drug Research, School of Pharmaceutical Sciences, Chongqing University, Chongqing, 401331, China
| | - Hangwei Xu
- School of Medicine, Hangzhou City University, Hangzhou, 310000, China
| | - Yingjun Chen
- Chongqing Key Laboratory of Natural Product Synthesis and Drug Research, School of Pharmaceutical Sciences, Chongqing University, Chongqing, 401331, China.
| | - Weiwei Xue
- Chongqing Key Laboratory of Natural Product Synthesis and Drug Research, School of Pharmaceutical Sciences, Chongqing University, Chongqing, 401331, China.
| |
Collapse
|
172
|
Vitting-Seerup K. Most protein domains exist as variants with distinct functions across cells, tissues and diseases. NAR Genom Bioinform 2023; 5:lqad084. [PMID: 37745975 PMCID: PMC10516350 DOI: 10.1093/nargab/lqad084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 08/09/2023] [Accepted: 09/05/2023] [Indexed: 09/26/2023] Open
Abstract
Protein domains are the active subunits that provide proteins with specific functions through precise three-dimensional structures. Such domains facilitate most protein functions, including molecular interactions and signal transduction. Currently, these protein domains are described and analyzed as invariable molecular building blocks with fixed functions. Here, I show that most human protein domains exist as multiple distinct variants termed 'domain isotypes'. Domain isotypes are used in a cell, tissue and disease-specific manner and have surprisingly different 3D structures. Accordingly, domain isotypes, compared to each other, modulate or abolish the functionality of protein domains. These results challenge the current view of protein domains as invariable building blocks and have significant implications for both wet- and dry-lab workflows. The extensive use of protein domain isotypes within protein isoforms adds to the literature indicating we need to transition to an isoform-centric research paradigm.
Collapse
Affiliation(s)
- Kristoffer Vitting-Seerup
- The Bioinformatics Section, Department of Health Technology, The Technical University of Denmark (DTU), Denmark
| |
Collapse
|
173
|
Bradshaw M, Squire JM, Morris E, Atkinson G, Richardson R, Lees J, Caputo M, Bigotti GM, Paul DM. Zebrafish as a model for cardiac disease; Cryo-EM structure of native cardiac thin filaments from Danio Rerio. J Muscle Res Cell Motil 2023; 44:179-192. [PMID: 37480427 PMCID: PMC10542308 DOI: 10.1007/s10974-023-09653-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Accepted: 07/04/2023] [Indexed: 07/24/2023]
Abstract
Actin, tropomyosin and troponin, the proteins that comprise the contractile apparatus of the cardiac thin filament, are highly conserved across species. We have used cryo-EM to study the three-dimensional structure of the zebrafish cardiac thin and actin filaments. With 70% of human genes having an obvious zebrafish orthologue, and conservation of 85% of disease-causing genes, zebrafish are a good animal model for the study of human disease. Our structure of the zebrafish thin filament reveals the molecular interactions between the constituent proteins, showing that the fundamental organisation of the complex is the same as that reported in the human reconstituted thin filament. A reconstruction of zebrafish cardiac F-actin demonstrates no deviations from human cardiac actin over an extended length of 14 actin subunits. Modelling zebrafish homology models into our maps enabled us to compare, in detail, the similarity with human models. The structural similarities of troponin-T in particular, a region known to contain a hypertrophic cardiomyopathy 'hotspot', confirm the suitability of zebrafish to study these disease-causing mutations.
Collapse
Affiliation(s)
- Marston Bradshaw
- Physiology, Pharmacology and Neuroscience, University of Bristol, Bristol, UK
| | - John M Squire
- Physiology, Pharmacology and Neuroscience, University of Bristol, Bristol, UK
| | - Edward Morris
- University of Glasgow, Glasgow, UK
- Institute of Cancer Research, London, UK
| | - Georgia Atkinson
- Translational Health Sciences, University of Bristol, Bristol, UK
| | - Rebecca Richardson
- Physiology, Pharmacology and Neuroscience, University of Bristol, Bristol, UK
| | - Jon Lees
- Translational Health Sciences, University of Bristol, Bristol, UK
| | - Massimo Caputo
- Translational Health Sciences, University of Bristol, Bristol, UK
| | - Giulia M Bigotti
- Translational Health Sciences, University of Bristol, Bristol, UK
| | - Danielle M Paul
- Physiology, Pharmacology and Neuroscience, University of Bristol, Bristol, UK.
| |
Collapse
|
174
|
Chenna A, Khan WH, Dash R, Saraswat S, Chugh A, Rathore AS, Goel G. An efficient computational protocol for template-based design of peptides that inhibit interactions involving SARS-CoV-2 proteins. Proteins 2023; 91:1222-1234. [PMID: 37283297 DOI: 10.1002/prot.26511] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Revised: 02/17/2023] [Accepted: 04/25/2023] [Indexed: 06/08/2023]
Abstract
The RNA-dependent RNA polymerase (RdRp) complex of SARS-CoV-2 lies at the core of its replication and transcription processes. The interfaces between holo-RdRp subunits are highly conserved, facilitating the design of inhibitors with high affinity for the interaction interface hotspots. We, therefore, take this as a model protein complex for the application of a structural bioinformatics protocol to design peptides that inhibit RdRp complexation by preferential binding at the interface of its core subunit nonstructural protein, nsp12, with accessory factor nsp7. Here, the interaction hotspots of the nsp7-nsp12 subunit of RdRp, determined from a long molecular dynamics trajectory, are used as a template. A large library of peptide sequences constructed from multiple hotspot motifs of nsp12 is screened in-silico to determine sequences with high geometric complementarity and interaction specificity for the binding interface of nsp7 (target) in the complex. Two lead designed peptides are extensively characterized using orthogonal bioanalytical methods to determine their suitability for inhibition of RdRp complexation. Binding affinity of these peptides to accessory factor nsp7, determined using a surface plasmon resonance (SPR) assay, is slightly better than that of nsp12: dissociation constant of 133nM and 167nM, respectively, compared to 473nM for nsp12. A competitive ELISA is used to quantify inhibition of nsp7-nsp12 complexation, with one of the lead peptides giving an IC50 of 25μM . Cell penetrability and cytotoxicity are characterized using a cargo delivery assay and MTT cytotoxicity assay, respectively. Overall, this work presents a proof-of-concept of an approach for rational discovery of peptide inhibitors of SARS-CoV-2 protein-protein interactions.
Collapse
Affiliation(s)
- Akshay Chenna
- Department of Chemical Engineering, Indian Institute of Technology Delhi, New Delhi, India
| | - Wajihul Hasan Khan
- Department of Chemical Engineering, Indian Institute of Technology Delhi, New Delhi, India
- Virology Unit, Department of Microbiology, All India Institute of Medical Sciences, New Delhi, India
| | - Rozaleen Dash
- Department of Chemical Engineering, Indian Institute of Technology Delhi, New Delhi, India
| | - Saurabh Saraswat
- Kusuma School of Biological Sciences, Indian Institute of Technology Delhi, New Delhi, India
| | - Archana Chugh
- Kusuma School of Biological Sciences, Indian Institute of Technology Delhi, New Delhi, India
| | - Anurag S Rathore
- Department of Chemical Engineering, Indian Institute of Technology Delhi, New Delhi, India
| | - Gaurav Goel
- Department of Chemical Engineering, Indian Institute of Technology Delhi, New Delhi, India
| |
Collapse
|
175
|
Zaman AB, Inan TT, De Jong K, Shehu A. Adaptive Stochastic Optimization to Improve Protein Conformation Sampling. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2759-2771. [PMID: 34882562 DOI: 10.1109/tcbb.2021.3134103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
We have long known that characterizing protein structures structure is key to understanding protein function. Computational approaches have largely addressed a narrow formulation of the problem, seeking to compute one native structure from an amino-acid sequence. Now AlphaFold2 is shown to be able to reveal a high-quality native structure for many proteins. However, researchers over the years have argued for broadening our view to account for the multiplicity of native structures. We now know that many protein molecules switch between different structures to regulate interactions with molecular partners in the cell. Elucidating such structures de novo is exceptionally difficult, as it requires exploration of possibly a very large structure space in search of competing, near-optimal structures. Here we report on a novel stochastic optimization method capable of revealing very different structures for a given protein from knowledge of its amino-acid sequence. The method leverages evolutionary search techniques and adapts its exploration of the search space to balance between exploration and exploitation in the presence of a computational budget. In addition to demonstrating the utility of this method for identifying multiple native structures, we additionally provide a benchmark dataset for researchers to continue work on this problem.
Collapse
|
176
|
Cho Y, Ryu H, Lim G, Nam S, Lee J. Improving Geometric Validation Metrics and Ensuring Consistency with Experimental Data through TrioSA: An NMR Refinement Protocol. Int J Mol Sci 2023; 24:13337. [PMID: 37686144 PMCID: PMC10487420 DOI: 10.3390/ijms241713337] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 08/21/2023] [Accepted: 08/24/2023] [Indexed: 09/10/2023] Open
Abstract
Protein model refinement a the crucial step in improving the quality of a predicted protein model. This study presents an NMR refinement protocol called TrioSA (torsion-angle and implicit-solvation-optimized simulated annealing) that improves the accuracy of backbone/side-chain conformations and the overall structural quality of proteins. TrioSA was applied to a subset of 3752 solution NMR protein structures accompanied by experimental NMR data: distance and dihedral angle restraints. We compared the initial NMR structures with the TrioSA-refined structures and found significant improvements in structural quality. In particular, we observed a reduction in both the maximum and number of NOE (nuclear Overhauser effect) violations, indicating better agreement with experimental NMR data. TrioSA improved geometric validation metrics of NMR protein structure, including backbone accuracy and the secondary structure ratio. We evaluated the contribution of each refinement element and found that the torsional angle potential played a significant role in improving the geometric validation metrics. In addition, we investigated protein-ligand docking to determine if TrioSA can improve biological outcomes. TrioSA structures exhibited better binding prediction compared to the initial NMR structures. This study suggests that further development and research in computational refinement methods could improve biomolecular NMR structural determination.
Collapse
Affiliation(s)
- Youngbeom Cho
- Department of Bioinformatics, KRIBB School of Bioscience, University of Science and Technology (UST), Daejeon 34141, Republic of Korea;
- Disease Target Structure Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 34141, Republic of Korea; (H.R.); (G.L.)
| | - Hyojung Ryu
- Disease Target Structure Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 34141, Republic of Korea; (H.R.); (G.L.)
| | - Gyutae Lim
- Disease Target Structure Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 34141, Republic of Korea; (H.R.); (G.L.)
| | - Seungyoon Nam
- Department of Genome Medicine and Science, AI Convergence Center for Medical Science, Gachon Institute of Genome Medicine and Science, Gachon University Gil Medical Center, Gachon University College of Medicine, Incheon 21565, Republic of Korea
- Department of Health Sciences and Technology, Gachon Advanced Institute for Health Sciences and Technology, Gachon University, Incheon 21999, Republic of Korea
| | - Jinhyuk Lee
- Department of Bioinformatics, KRIBB School of Bioscience, University of Science and Technology (UST), Daejeon 34141, Republic of Korea;
- Disease Target Structure Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 34141, Republic of Korea; (H.R.); (G.L.)
| |
Collapse
|
177
|
Gordon CH, Hendrix E, He Y, Walker MC. AlphaFold Accurately Predicts the Structure of Ribosomally Synthesized and Post-Translationally Modified Peptide Biosynthetic Enzymes. Biomolecules 2023; 13:1243. [PMID: 37627309 PMCID: PMC10452190 DOI: 10.3390/biom13081243] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 08/08/2023] [Accepted: 08/10/2023] [Indexed: 08/27/2023] Open
Abstract
Ribosomally synthesized and post-translationally modified peptides (RiPPs) are a growing class of natural products biosynthesized from a genetically encoded precursor peptide. The enzymes that install the post-translational modifications on these peptides have the potential to be useful catalysts in the production of natural-product-like compounds and can install non-proteogenic amino acids in peptides and proteins. However, engineering these enzymes has been somewhat limited, due in part to limited structural information on enzymes in the same families that nonetheless exhibit different substrate selectivities. Despite AlphaFold2's superior performance in single-chain protein structure prediction, its multimer version lacks accuracy and requires high-end GPUs, which are not typically available to most research groups. Additionally, the default parameters of AlphaFold2 may not be optimal for predicting complex structures like RiPP biosynthetic enzymes, due to their dynamic binding and substrate-modifying mechanisms. This study assessed the efficacy of the structure prediction program ColabFold (a variant of AlphaFold2) in modeling RiPP biosynthetic enzymes in both monomeric and dimeric forms. After extensive benchmarking, it was found that there were no statistically significant differences in the accuracy of the predicted structures, regardless of the various possible prediction parameters that were examined, and that with the default parameters, ColabFold was able to produce accurate models. We then generated additional structural predictions for select RiPP biosynthetic enzymes from multiple protein families and biosynthetic pathways. Our findings can serve as a reference for future enzyme engineering complemented by AlphaFold-related tools.
Collapse
Affiliation(s)
| | | | | | - Mark C. Walker
- Department of Chemistry and Chemical Biology, University of New Mexico, Albuquerque, NM 87131, USA
| |
Collapse
|
178
|
Zhang Z, Li C, Li Q, Su X, Li J, Zhu L, Lin XJ, Shen J. Structure prediction of novel isoforms from uveal melanoma by AlphaFold. Sci Data 2023; 10:513. [PMID: 37542084 PMCID: PMC10403560 DOI: 10.1038/s41597-023-02429-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Accepted: 07/28/2023] [Indexed: 08/06/2023] Open
Abstract
Alternative splicing is an important mechanism that enhances protein functional diversity. To date, our understanding of alternative splicing variants has been based on mRNA transcript data, but due to the difficulty in predicting protein structures, protein tertiary structures have been largely unexplored. However, with the release of AlphaFold, which predicts three-dimensional models of proteins, this challenge is rapidly being overcome. Here, we present a dataset of 315 predicted structures of abnormal isoforms in 18 uveal melanoma patients based on second- and third-generation transcriptome-sequencing data. This information comprises a high-quality set of structural data on recurrent aberrant isoforms that can be used in multiple types of studies, from those aimed at revealing potential therapeutic targets to those aimed at recognizing of cancer neoantigens at the atomic level.
Collapse
Affiliation(s)
- Zhe Zhang
- Department of Ophthalmology, Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China.
- Shanghai Key Laboratory of Orbital Diseases and Ocular Oncology, Shanghai, 200025, China.
- Institute of Translational Medicine, National Facility for Translational Medicine, Shanghai Jiao Tong University, Shanghai, 200240, China.
| | - Chen Li
- High Performance Computing Center, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Qian Li
- Department of Ophthalmology, Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China
- Shanghai Key Laboratory of Orbital Diseases and Ocular Oncology, Shanghai, 200025, China
- Institute of Translational Medicine, National Facility for Translational Medicine, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Xiaoming Su
- High Performance Computing Center, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Jiayi Li
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences & Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Lili Zhu
- Songjiang Research Institute and Songjiang Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 201600, China
| | - Xinhua James Lin
- High Performance Computing Center, Shanghai Jiao Tong University, Shanghai, 200240, China.
| | - Jianfeng Shen
- Department of Ophthalmology, Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China.
- Shanghai Key Laboratory of Orbital Diseases and Ocular Oncology, Shanghai, 200025, China.
- Institute of Translational Medicine, National Facility for Translational Medicine, Shanghai Jiao Tong University, Shanghai, 200240, China.
| |
Collapse
|
179
|
Fowler NJ, Albalwi MF, Lee S, Hounslow AM, Williamson MP. Improved methodology for protein NMR structure calculation using hydrogen bond restraints and ANSURR validation: The SH2 domain of SH2B1. Structure 2023; 31:975-986.e3. [PMID: 37311460 DOI: 10.1016/j.str.2023.05.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Revised: 05/02/2023] [Accepted: 05/18/2023] [Indexed: 06/15/2023]
Abstract
Protein structures calculated using NMR data are less accurate and less well-defined than they could be. Here we use the program ANSURR to show that this deficiency is at least in part due to a lack of hydrogen bond restraints. We describe a protocol to introduce hydrogen bond restraints into the structure calculation of the SH2 domain from SH2B1 in a systematic and transparent way and show that the structures generated are more accurate and better defined as a result. We also show that ANSURR can be used as a guide to know when the structure calculation is good enough to stop.
Collapse
Affiliation(s)
- Nicholas J Fowler
- School of Biosciences, University of Sheffield, S10 2TN Sheffield, UK.
| | - Marym F Albalwi
- School of Biosciences, University of Sheffield, S10 2TN Sheffield, UK
| | - Subin Lee
- School of Biosciences, University of Sheffield, S10 2TN Sheffield, UK
| | - Andrea M Hounslow
- School of Biosciences, University of Sheffield, S10 2TN Sheffield, UK
| | - Mike P Williamson
- School of Biosciences, University of Sheffield, S10 2TN Sheffield, UK.
| |
Collapse
|
180
|
Sala D, Engelberger F, Mchaourab HS, Meiler J. Modeling conformational states of proteins with AlphaFold. Curr Opin Struct Biol 2023; 81:102645. [PMID: 37392556 DOI: 10.1016/j.sbi.2023.102645] [Citation(s) in RCA: 36] [Impact Index Per Article: 36.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 05/16/2023] [Accepted: 06/01/2023] [Indexed: 07/03/2023]
Abstract
Many proteins exert their function by switching among different structures. Knowing the conformational ensembles affiliated with these states is critical to elucidate key mechanistic aspects that govern protein function. While experimental determination efforts are still bottlenecked by cost, time, and technical challenges, the machine-learning technology AlphaFold showed near experimental accuracy in predicting the three-dimensional structure of monomeric proteins. However, an AlphaFold ensemble of models usually represents a single conformational state with minimal structural heterogeneity. Consequently, several pipelines have been proposed to either expand the structural breadth of an ensemble or bias the prediction toward a desired conformational state. Here, we analyze how those pipelines work, what they can and cannot predict, and future directions.
Collapse
Affiliation(s)
- D Sala
- Institute of Drug Discovery, Faculty of Medicine, University of Leipzig, 04103 Leipzig, Germany. https://twitter.com/sala_davide
| | - F Engelberger
- Institute of Drug Discovery, Faculty of Medicine, University of Leipzig, 04103 Leipzig, Germany. https://twitter.com/fengel97
| | - H S Mchaourab
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN, USA. https://twitter.com/Mchaourablab
| | - J Meiler
- Institute of Drug Discovery, Faculty of Medicine, University of Leipzig, 04103 Leipzig, Germany; Center for Structural Biology, Vanderbilt University, Nashville, TN 37240, USA; Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI), Dresden/Leipzig, Germany.
| |
Collapse
|
181
|
Wang Y, Guo Z, Cheng J. Single-cell Hi-C data enhancement with deep residual and generative adversarial networks. Bioinformatics 2023; 39:btad458. [PMID: 37498561 PMCID: PMC10403428 DOI: 10.1093/bioinformatics/btad458] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 07/19/2023] [Accepted: 07/25/2023] [Indexed: 07/28/2023] Open
Abstract
MOTIVATION The spatial genome organization of a eukaryotic cell is important for its function. The development of single-cell technologies for probing the 3D genome conformation, especially single-cell chromosome conformation capture techniques, has enabled us to understand genome function better than before. However, due to extreme sparsity and high noise associated with single-cell Hi-C data, it is still difficult to study genome structure and function using the HiC-data of one single cell. RESULTS In this work, we developed a deep learning method ScHiCEDRN based on deep residual networks and generative adversarial networks for the imputation and enhancement of Hi-C data of a single cell. In terms of both image evaluation and Hi-C reproducibility metrics, ScHiCEDRN outperforms the four deep learning methods (DeepHiC, HiCPlus, HiCSR, and Loopenhance) on enhancing the raw single-cell Hi-C data of human and Drosophila. The experiments also show that it can generate single-cell Hi-C data more suitable for identifying topologically associating domain boundaries and reconstructing 3D chromosome structures than the existing methods. Moreover, ScHiCEDRN's performance generalizes well across different single cells and cell types, and it can be applied to improving population Hi-C data. AVAILABILITY AND IMPLEMENTATION The source code of ScHiCEDRN is available at the GitHub repository: https://github.com/BioinfoMachineLearning/ScHiCEDRN.
Collapse
Affiliation(s)
- Yanli Wang
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, United States
- NextGen Precision Health Institute, University of Missouri, Columbia, MO 65211, United States
| | - Zhiye Guo
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, United States
- NextGen Precision Health Institute, University of Missouri, Columbia, MO 65211, United States
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, United States
- NextGen Precision Health Institute, University of Missouri, Columbia, MO 65211, United States
| |
Collapse
|
182
|
Cui G, Jiang Z, Chen Y, Li Y, Ai S, Sun R, Yi X, Zhong G. Evolutional insights into the interaction between Rab7 and RILP in lysosome motility. iScience 2023; 26:107040. [PMID: 37534141 PMCID: PMC10391735 DOI: 10.1016/j.isci.2023.107040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Revised: 12/12/2022] [Accepted: 06/01/2023] [Indexed: 08/04/2023] Open
Abstract
Lysosome motility is critical for the cellular function. However, Rab7-related transport elements showed genetic differences between vertebrates and invertebrates, making the mechanism of lysosomal motility mysterious. We suggested that Rab7 interacted with RILP as a feature of highly evolved organisms since they could interact with each other in Spodoptera frugiperda but not in Drosophila melanogaster. The N-terminus of Sf-RILP was identified to be necessary for their interaction, and Glu61 was supposed to be the key point for the stability of the interaction. A GC-rich domain on the C-terminal parts of Sf-RILP hampered the expression of Sf-RILP and its interaction with Sf-Rab7. Although the corresponding vital amino acids in the mammalian model at the C-terminus of Sf-RILP turned to be neutral, the C-terminus would also help with the homologous interactions between RILP fragments in insects. The significantly different interactions in invertebrates shed light on the biodiversity and complexity of lysosomal motility.
Collapse
Affiliation(s)
- Gaofeng Cui
- College of Plant Protection, South China Agricultural University, Guangzhou 510642, China
- Guangdong Provincial Key Laboratory of Silviculture, Protection and Utilization, Guangdong Academy of Forestry, Guangzhou 510520, China
| | - Zhiyan Jiang
- College of Plant Protection, South China Agricultural University, Guangzhou 510642, China
| | - Yaoyao Chen
- College of Plant Protection, South China Agricultural University, Guangzhou 510642, China
| | - Yun Li
- College of Plant Protection, South China Agricultural University, Guangzhou 510642, China
| | - Shupei Ai
- College of Plant Protection, South China Agricultural University, Guangzhou 510642, China
| | - Ranran Sun
- College of Plant Protection, South China Agricultural University, Guangzhou 510642, China
| | - Xin Yi
- College of Plant Protection, South China Agricultural University, Guangzhou 510642, China
| | - Guohua Zhong
- College of Plant Protection, South China Agricultural University, Guangzhou 510642, China
| |
Collapse
|
183
|
Hetmann M, Langner C, Durmaz V, Cespugli M, Köchl K, Krassnigg A, Blaschitz K, Groiss S, Loibner M, Ruau D, Zatloukal K, Gruber K, Steinkellner G, Gruber CC. Identification and validation of fusidic acid and flufenamic acid as inhibitors of SARS-CoV-2 replication using DrugSolver CavitomiX. Sci Rep 2023; 13:11783. [PMID: 37479788 PMCID: PMC10362000 DOI: 10.1038/s41598-023-39071-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Accepted: 07/19/2023] [Indexed: 07/23/2023] Open
Abstract
In this work, we present DrugSolver CavitomiX, a novel computational pipeline for drug repurposing and identifying ligands and inhibitors of target enzymes. The pipeline is based on cavity point clouds representing physico-chemical properties of the cavity induced solely by the protein. To test the pipeline's ability to identify inhibitors, we chose enzymes essential for SARS-CoV-2 replication as a test system. The active-site cavities of the viral enzymes main protease (Mpro) and papain-like protease (Plpro), as well as of the human transmembrane serine protease 2 (TMPRSS2), were selected as target cavities. Using active-site point-cloud comparisons, it was possible to identify two compounds-flufenamic acid and fusidic acid-which show strong inhibition of viral replication. The complexes from which fusidic acid and flufenamic acid were derived would not have been identified using classical sequence- and structure-based methods as they show very little structural (TM-score: 0.1 and 0.09, respectively) and very low sequence (~ 5%) identity to Mpro and TMPRSS2, respectively. Furthermore, a cavity-based off-target screening was performed using acetylcholinesterase (AChE) as an example. Using cavity comparisons, the human carboxylesterase was successfully identified, which is a described off-target for AChE inhibitors.
Collapse
Affiliation(s)
- M Hetmann
- Innophore, San Francisco, CA, USA
- Institute of Molecular Biosciences, University of Graz, Graz, Austria
- Austrian Centre of Industrial Biotechnology, Graz, Austria
| | - C Langner
- Diagnostic and Research Institute of Pathology, Medical University of Graz, Graz, Austria
| | - V Durmaz
- Innophore, San Francisco, CA, USA
| | | | - K Köchl
- Innophore, San Francisco, CA, USA
| | | | | | - S Groiss
- Diagnostic and Research Institute of Pathology, Medical University of Graz, Graz, Austria
| | - M Loibner
- Diagnostic and Research Institute of Pathology, Medical University of Graz, Graz, Austria
| | - D Ruau
- NVIDIA, Santa Clara, CA, USA
| | - K Zatloukal
- Diagnostic and Research Institute of Pathology, Medical University of Graz, Graz, Austria
| | - K Gruber
- Innophore, San Francisco, CA, USA
- Institute of Molecular Biosciences, University of Graz, Graz, Austria
- Austrian Centre of Industrial Biotechnology, Graz, Austria
- Field of Excellence BioHealth - University of Graz, Graz, Austria
| | - G Steinkellner
- Innophore, San Francisco, CA, USA
- Institute of Molecular Biosciences, University of Graz, Graz, Austria
- Field of Excellence BioHealth - University of Graz, Graz, Austria
| | - C C Gruber
- Innophore, San Francisco, CA, USA.
- Institute of Molecular Biosciences, University of Graz, Graz, Austria.
- Austrian Centre of Industrial Biotechnology, Graz, Austria.
- Field of Excellence BioHealth - University of Graz, Graz, Austria.
| |
Collapse
|
184
|
Yin R, Pierce BG. Evaluation of AlphaFold Antibody-Antigen Modeling with Implications for Improving Predictive Accuracy. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.05.547832. [PMID: 37461571 PMCID: PMC10349958 DOI: 10.1101/2023.07.05.547832] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 07/24/2023]
Abstract
High resolution antibody-antigen structures provide critical insights into immune recognition and can inform therapeutic design. The challenges of experimental structural determination and the diversity of the immune repertoire underscore the necessity of accurate computational tools for modeling antibody-antigen complexes. Initial benchmarking showed that despite overall success in modeling protein-protein complexes, AlphaFold and AlphaFold-Multimer have limited success in modeling antibody-antigen interactions. In this study, we performed a thorough analysis of AlphaFold's antibody-antigen modeling performance on 429 nonredundant antibody-antigen complex structures, identifying useful confidence metrics for predicting model quality, and features of complexes associated with improved modeling success. We show the importance of bound-like component modeling in complex assembly accuracy, and that the current version of AlphaFold improves near-native modeling success to over 30%, versus approximately 20% for a previous version. With this improved success, AlphaFold can generate accurate antibody-antigen models in many cases, while additional training may further improve its performance.
Collapse
Affiliation(s)
- Rui Yin
- University of Maryland Institute for Bioscience and Biotechnology Research, Rockville, MD 20850, USA
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, MD 20742, USA
| | - Brian G. Pierce
- University of Maryland Institute for Bioscience and Biotechnology Research, Rockville, MD 20850, USA
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, MD 20742, USA
| |
Collapse
|
185
|
Wu KE, Zou JY, Chang H. Machine learning modeling of RNA structures: methods, challenges and future perspectives. Brief Bioinform 2023; 24:bbad210. [PMID: 37280185 DOI: 10.1093/bib/bbad210] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2023] [Revised: 05/12/2023] [Accepted: 05/17/2023] [Indexed: 06/08/2023] Open
Abstract
The three-dimensional structure of RNA molecules plays a critical role in a wide range of cellular processes encompassing functions from riboswitches to epigenetic regulation. These RNA structures are incredibly dynamic and can indeed be described aptly as an ensemble of structures that shifts in distribution depending on different cellular conditions. Thus, the computational prediction of RNA structure poses a unique challenge, even as computational protein folding has seen great advances. In this review, we focus on a variety of machine learning-based methods that have been developed to predict RNA molecules' secondary structure, as well as more complex tertiary structures. We survey commonly used modeling strategies, and how many are inspired by or incorporate thermodynamic principles. We discuss the shortcomings that various design decisions entail and propose future directions that could build off these methods to yield more robust, accurate RNA structure predictions.
Collapse
Affiliation(s)
- Kevin E Wu
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
- Center for Personal Dynamic Regulomes, Stanford University, Stanford, CA 94305, USA
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - James Y Zou
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Howard Chang
- Howard Hughes Medical Institute, Stanford University, Stanford, CA 94305, USA
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA 94305, USA
| |
Collapse
|
186
|
Lomoio U, Puccio B, Tradigo G, Guzzi PH, Veltri P. SARS-CoV-2 protein structure and sequence mutations: Evolutionary analysis and effects on virus variants. PLoS One 2023; 18:e0283400. [PMID: 37471335 PMCID: PMC10358949 DOI: 10.1371/journal.pone.0283400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Accepted: 07/04/2023] [Indexed: 07/22/2023] Open
Abstract
The structure and sequence of proteins strongly influence their biological functions. New models and algorithms can help researchers in understanding how the evolution of sequences and structures is related to changes in functions. Recently, studies of SARS-CoV-2 Spike (S) protein structures have been performed to predict binding receptors and infection activity in COVID-19, hence the scientific interest in the effects of virus mutations due to sequence, structure and vaccination arises. However, there is the need for models and tools to study the links between the evolution of S protein sequence, structure and functions, and virus transmissibility and the effects of vaccination. As studies on S protein have been generated a large amount of relevant information, we propose in this work to use Protein Contact Networks (PCNs) to relate protein structures with biological properties by means of network topology properties. Topological properties are used to compare the structural changes with sequence changes. We find that both node centrality and community extraction analysis can be used to relate protein stability and functionality with sequence mutations. Starting from this we compare structural evolution to sequence changes and study mutations from a temporal perspective focusing on virus variants. Finally by applying our model to the Omicron variant we report a timeline correlation between Omicron and the vaccination campaign.
Collapse
Affiliation(s)
- Ugo Lomoio
- Department of Surgical and Medical Sciences, University of Catanzaro, Catanzaro, Italy
| | - Barbara Puccio
- Department of Surgical and Medical Sciences, University of Catanzaro, Catanzaro, Italy
| | | | - Pietro Hiram Guzzi
- Department of Surgical and Medical Sciences, University of Catanzaro, Catanzaro, Italy
| | | |
Collapse
|
187
|
Kakoulidis P, Vlachos IS, Thanos D, Blatch GL, Emiris IZ, Anastasiadou E. Identifying and profiling structural similarities between Spike of SARS-CoV-2 and other viral or host proteins with Machaon. Commun Biol 2023; 6:752. [PMID: 37468602 PMCID: PMC10356814 DOI: 10.1038/s42003-023-05076-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Accepted: 06/26/2023] [Indexed: 07/21/2023] Open
Abstract
Using protein structure to predict function, interactions, and evolutionary history is still an open challenge, with existing approaches relying extensively on protein homology and families. Here, we present Machaon, a data-driven method combining orientation invariant metrics on phi-psi angles, inter-residue contacts and surface complexity. It can be readily applied on whole structures or segments-such as domains and binding sites. Machaon was applied on SARS-CoV-2 Spike monomers of native, Delta and Omicron variants and identified correlations with a wide range of viral proteins from close to distant taxonomy ranks, as well as host proteins, such as ACE2 receptor. Machaon's meta-analysis of the results highlights structural, chemical and transcriptional similarities between the Spike monomer and human proteins, indicating a multi-level viral mimicry. This extended analysis also revealed relationships of the Spike protein with biological processes such as ubiquitination and angiogenesis and highlighted different patterns in virus attachment among the studied variants. Available at: https://machaonweb.com .
Collapse
Affiliation(s)
- Panos Kakoulidis
- Department of Informatics and Telecommunications, National and Kapodistrian University of Athens, Ilisia, 157 84, Athens, Greece
- Biomedical Research Foundation of the Academy of Athens, 4 Soranou Ephessiou St., 115 27, Athens, Greece
| | - Ioannis S Vlachos
- Broad Institute of MIT and Harvard, Merkin Building, 415 Main St., Cambridge, MA, 02142, USA
- Cancer Research Institute, Beth Israel Deaconess Medical Center, 330 Brookline Avenue, Boston, MA, 02215, USA
- Department of Pathology, Beth Israel Deaconess Medical Center, 330 Brookline Avenue, Boston, MA, 02215, USA
- Harvard Medical School, 25 Shattuck Street, Boston, MA, 02115, USA
- Spatial Technologies Unit, Harvard Medical School Initiative for RNA Medicine, Dana Building, Beth Israel Deaconess Medical Center, 330 Brookline Avenue, Boston, MA, 02215, USA
| | - Dimitris Thanos
- Biomedical Research Foundation of the Academy of Athens, 4 Soranou Ephessiou St., 115 27, Athens, Greece
| | - Gregory L Blatch
- Biomedical Biotechnology Research Unit, Department of Biochemistry and Microbiology, Rhodes University, PO Box 94, Makhanda (Grahamstown) 6140, Eastern Cape, South Africa
- Biomedical and Drug Discovery Research Group, Faculty of Health Sciences, Higher Colleges of Technology, PO 25026, Sharjah, UAE
- Institute for Health and Sport, Victoria University, Melbourne, PO Box 14428, VIC 8001, Melbourne, Australia
- The Vice Chancellery, The University of Notre Dame Australia, PO Box 1225, WA 6959, Fremantle, Australia
| | - Ioannis Z Emiris
- Department of Informatics and Telecommunications, National and Kapodistrian University of Athens, Ilisia, 157 84, Athens, Greece
- ATHENA Research and Innovation Center, Artemidos 6 & Epidavrou 15125, Marousi, Greece
| | - Ema Anastasiadou
- Biomedical Research Foundation of the Academy of Athens, 4 Soranou Ephessiou St., 115 27, Athens, Greece.
| |
Collapse
|
188
|
Jiao Y, Xing Y, Sun Y. Impact of E484Q and L452R Mutations on Structure and Binding Behavior of SARS-CoV-2 B.1.617.1 Using Deep Learning AlphaFold2, Molecular Docking and Dynamics Simulation. Int J Mol Sci 2023; 24:11564. [PMID: 37511322 PMCID: PMC10380202 DOI: 10.3390/ijms241411564] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 07/04/2023] [Accepted: 07/13/2023] [Indexed: 07/30/2023] Open
Abstract
During the outbreak of COVID-19, many SARS-CoV-2 variants presented key amino acid mutations that influenced their binding abilities with angiotensin-converting enzyme 2 (hACE2) and neutralizing antibodies. For the B.1.617 lineage, there had been fears that two key mutations, i.e., L452R and E484Q, would have additive effects on the evasion of neutralizing antibodies. In this paper, we systematically investigated the impact of the L452R and E484Q mutations on the structure and binding behavior of B.1.617.1 using deep learning AlphaFold2, molecular docking and dynamics simulation. We firstly predicted and verified the structure of the S protein containing L452R and E484Q mutations via the AlphaFold2-calculated pLDDT value and compared it with the experimental structure. Next, a molecular simulation was performed to reveal the structural and interaction stabilities of the S protein of the double mutant variant with hACE2. We found that the double mutations, L452R and E484Q, could lead to a decrease in hydrogen bonds and higher interaction energy between the S protein and hACE2, demonstrating the lower structural stability and the worse binding affinity in the long dynamic evolutional process, even though the molecular docking showed the lower binding energy score of the S1 RBD of the double mutant variant with hACE2 than that of the wild type (WT) with hACE2. In addition, docking to three approved neutralizing monoclonal antibodies (mAbs) showed a reduced binding affinity of the double mutant variant, suggesting a lower neutralization ability of the mAbs against the double mutant variant. Our study helps lay the foundation for further SARS-CoV-2 studies and provides bioinformatics and computational insights into how the double mutations lead to immune evasion, which could offer guidance for subsequent biomedical studies.
Collapse
Affiliation(s)
- Yanqi Jiao
- School of Science, Harbin Institute of Technology (Shenzhen), Shenzhen 518055, China
| | - Yichen Xing
- School of Science, Harbin Institute of Technology (Shenzhen), Shenzhen 518055, China
| | - Yao Sun
- School of Science, Harbin Institute of Technology (Shenzhen), Shenzhen 518055, China
| |
Collapse
|
189
|
Vallat B, Tauriello G, Bienert S, Haas J, Webb BM, Žídek A, Zheng W, Peisach E, Piehl DW, Anischanka I, Sillitoe I, Tolchard J, Varadi M, Baker D, Orengo C, Zhang Y, Hoch JC, Kurisu G, Patwardhan A, Velankar S, Burley SK, Sali A, Schwede T, Berman HM, Westbrook JD. ModelCIF: An Extension of PDBx/mmCIF Data Representation for Computed Structure Models. J Mol Biol 2023; 435:168021. [PMID: 36828268 PMCID: PMC10293049 DOI: 10.1016/j.jmb.2023.168021] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 02/15/2023] [Accepted: 02/16/2023] [Indexed: 02/24/2023]
Abstract
ModelCIF (github.com/ihmwg/ModelCIF) is a data information framework developed for and by computational structural biologists to enable delivery of Findable, Accessible, Interoperable, and Reusable (FAIR) data to users worldwide. ModelCIF describes the specific set of attributes and metadata associated with macromolecular structures modeled by solely computational methods and provides an extensible data representation for deposition, archiving, and public dissemination of predicted three-dimensional (3D) models of macromolecules. It is an extension of the Protein Data Bank Exchange / macromolecular Crystallographic Information Framework (PDBx/mmCIF), which is the global data standard for representing experimentally-determined 3D structures of macromolecules and associated metadata. The PDBx/mmCIF framework and its extensions (e.g., ModelCIF) are managed by the Worldwide Protein Data Bank partnership (wwPDB, wwpdb.org) in collaboration with relevant community stakeholders such as the wwPDB ModelCIF Working Group (wwpdb.org/task/modelcif). This semantically rich and extensible data framework for representing computed structure models (CSMs) accelerates the pace of scientific discovery. Herein, we describe the architecture, contents, and governance of ModelCIF, and tools and processes for maintaining and extending the data standard. Community tools and software libraries that support ModelCIF are also described.
Collapse
Affiliation(s)
- Brinda Vallat
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, New Brunswick, NJ 08901, USA.
| | - Gerardo Tauriello
- Biozentrum, University of Basel, Basel, Switzerland; Computational Structural Biology, SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Stefan Bienert
- Biozentrum, University of Basel, Basel, Switzerland; Computational Structural Biology, SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Juergen Haas
- Biozentrum, University of Basel, Basel, Switzerland; Computational Structural Biology, SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Benjamin M Webb
- Department of Bioengineering and Therapeutic Sciences, the Quantitative Biosciences Institute (QBI), and the Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, CA 94157, USA
| | | | - Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Ezra Peisach
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Dennis W Piehl
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Ivan Anischanka
- Department of Biochemistry, and Institute for Protein Design, University of Washington, Seattle, WA 98195, USA
| | - Ian Sillitoe
- Department of Structural and Molecular Biology, UCL, London, UK
| | - James Tolchard
- AlphaFold Protein Structure Database, European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD, UK; Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD, UK
| | - Mihaly Varadi
- AlphaFold Protein Structure Database, European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD, UK; Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD, UK
| | - David Baker
- Department of Biochemistry, and Institute for Protein Design, University of Washington, Seattle, WA 98195, USA; Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| | | | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Jeffrey C Hoch
- Biological Magnetic Resonance Data Bank, Department of Molecular Biology and Biophysics, University of Connecticut, Farmington, CT 06030, USA
| | - Genji Kurisu
- Protein Data Bank Japan, Institute for Protein Research, Osaka University, Suita, Osaka 565-0871, Japan
| | - Ardan Patwardhan
- Electron Microscopy Data Bank, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Sameer Velankar
- AlphaFold Protein Structure Database, European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD, UK; Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD, UK
| | - Stephen K Burley
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, New Brunswick, NJ 08901, USA; Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, USA; Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Andrej Sali
- Department of Bioengineering and Therapeutic Sciences, the Quantitative Biosciences Institute (QBI), and the Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, CA 94157, USA. https://twitter.com/salilab_ucsf
| | - Torsten Schwede
- Biozentrum, University of Basel, Basel, Switzerland; Computational Structural Biology, SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Helen M Berman
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - John D Westbrook
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, New Brunswick, NJ 08901, USA
| |
Collapse
|
190
|
Shivnauth V, Pretheepkumar S, Marchetta EJR, Rossi CAM, Amani K, Castroverde CDM. Structural diversity and stress regulation of the plant immunity-associated CALMODULIN-BINDING PROTEIN 60 (CBP60) family of transcription factors in Solanum lycopersicum (tomato). Funct Integr Genomics 2023; 23:236. [PMID: 37439880 DOI: 10.1007/s10142-023-01172-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Revised: 06/23/2023] [Accepted: 07/08/2023] [Indexed: 07/14/2023]
Abstract
Cellular signaling generates calcium (Ca2+) ions, which are ubiquitous secondary messengers decoded by calcium-dependent protein kinases, calcineurins, calreticulin, calmodulins (CAMs), and CAM-binding proteins. Previous studies in the model plant Arabidopsis thaliana have shown the critical roles of the CAM-BINDING PROTEIN 60 (CBP60) protein family in plant growth, stress responses, and immunity. Certain CBP60 factors can regulate plant immune responses, like pattern-triggered immunity, effector-triggered immunity, and synthesis of major plant immune-activating metabolites salicylic acid (SA) and N-hydroxypipecolic acid (NHP). Although homologous CBP60 sequences have been identified in the plant kingdom, their function and regulation in most species remain unclear. In this paper, we specifically characterized 11 members of the CBP60 family in the agriculturally important crop tomato (Solanum lycopersicum). Protein sequence analyses revealed that three CBP60 homologs have the closest amino acid identity to Arabidopsis CBP60g and SARD1, master transcription factors involved in plant immunity. Strikingly, AlphaFold deep learning-assisted prediction of protein structures highlighted close structural similarity between these tomato and Arabidopsis CBP60 homologs. Conserved domain analyses revealed that they possess CAM-binding domains and DNA-binding domains, reflecting their potential involvement in linking Ca2+ signaling and transcriptional regulation in tomato plants. In terms of their gene expression profiles under biotic (Pseudomonas syringae pv. tomato DC3000 pathogen infection) and/or abiotic stress (warming temperatures), five tomato CBP60 genes were pathogen-responsive and temperature-sensitive, reminiscent of Arabidopsis CBP60g and SARD1. Overall, we present a genome-wide identification of the CBP60 gene/protein family in tomato plants, and we provide evidence on their regulation and potential function as Ca2+-sensing transcriptional regulators.
Collapse
Affiliation(s)
- Vanessa Shivnauth
- Department of Biology, Wilfrid Laurier University, Waterloo, ON, N2L 3C5, Canada
| | - Sonya Pretheepkumar
- Department of Biology, Wilfrid Laurier University, Waterloo, ON, N2L 3C5, Canada
| | - Eric J R Marchetta
- Department of Biology, Wilfrid Laurier University, Waterloo, ON, N2L 3C5, Canada
| | - Christina A M Rossi
- Department of Biology, Wilfrid Laurier University, Waterloo, ON, N2L 3C5, Canada
| | - Keaun Amani
- Department of Biology, Wilfrid Laurier University, Waterloo, ON, N2L 3C5, Canada
| | | |
Collapse
|
191
|
Rahaman MM, Khan NS, Zhang S. RNAMotifComp: a comprehensive method to analyze and identify structurally similar RNA motif families. Bioinformatics 2023; 39:i337-i346. [PMID: 37387191 DOI: 10.1093/bioinformatics/btad223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION The 3D structures of RNA play a critical role in understanding their functionalities. There exist several computational methods to study RNA 3D structures by identifying structural motifs and categorizing them into several motif families based on their structures. Although the number of such motif families is not limited, a few of them are well-studied. Out of these structural motif families, there exist several families that are visually similar or very close in structure, even with different base interactions. Alternatively, some motif families share a set of base interactions but maintain variation in their 3D formations. These similarities among different motif families, if known, can provide a better insight into the RNA 3D structural motifs as well as their characteristic functions in cell biology. RESULTS In this work, we proposed a method, RNAMotifComp, that analyzes the instances of well-known structural motif families and establishes a relational graph among them. We also have designed a method to visualize the relational graph where the families are shown as nodes and their similarity information is represented as edges. We validated our discovered correlations of the motif families using RNAMotifContrast. Additionally, we used a basic Naïve Bayes classifier to show the importance of RNAMotifComp. The relational analysis explains the functional analogies of divergent motif families and illustrates the situations where the motifs of disparate families are predicted to be of the same family. AVAILABILITY AND IMPLEMENTATION Source code publicly available at https://github.com/ucfcbb/RNAMotifFamilySimilarity.
Collapse
Affiliation(s)
- Md Mahfuzur Rahaman
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, United States
| | - Nabila Shahnaz Khan
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, United States
| | - Shaojie Zhang
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, United States
| |
Collapse
|
192
|
Chen X, Morehead A, Liu J, Cheng J. A gated graph transformer for protein complex structure quality assessment and its performance in CASP15. Bioinformatics 2023; 39:i308-i317. [PMID: 37387159 PMCID: PMC10311325 DOI: 10.1093/bioinformatics/btad203] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION Proteins interact to form complexes to carry out essential biological functions. Computational methods such as AlphaFold-multimer have been developed to predict the quaternary structures of protein complexes. An important yet largely unsolved challenge in protein complex structure prediction is to accurately estimate the quality of predicted protein complex structures without any knowledge of the corresponding native structures. Such estimations can then be used to select high-quality predicted complex structures to facilitate biomedical research such as protein function analysis and drug discovery. RESULTS In this work, we introduce a new gated neighborhood-modulating graph transformer to predict the quality of 3D protein complex structures. It incorporates node and edge gates within a graph transformer framework to control information flow during graph message passing. We trained, evaluated and tested the method (called DProQA) on newly-curated protein complex datasets before the 15th Critical Assessment of Techniques for Protein Structure Prediction (CASP15) and then blindly tested it in the 2022 CASP15 experiment. The method was ranked 3rd among the single-model quality assessment methods in CASP15 in terms of the ranking loss of TM-score on 36 complex targets. The rigorous internal and external experiments demonstrate that DProQA is effective in ranking protein complex structures. AVAILABILITY AND IMPLEMENTATION The source code, data, and pre-trained models are available at https://github.com/jianlin-cheng/DProQA.
Collapse
Affiliation(s)
- Xiao Chen
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65201, United States
| | - Alex Morehead
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65201, United States
| | - Jian Liu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65201, United States
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65201, United States
| |
Collapse
|
193
|
Peng J, Zhao L. The origin and structural evolution of de novo genes in Drosophila. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.13.532420. [PMID: 37425675 PMCID: PMC10326970 DOI: 10.1101/2023.03.13.532420] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2023]
Abstract
Although previously thought to be unlikely, recent studies have shown that de novo gene origination from previously non-genic sequences is a relatively common mechanism for gene innovation in many species and taxa. These young genes provide a unique set of candidates to study the structural and functional origination of proteins. However, our understanding of their protein structures and how these structures originate and evolve are still limited, due to a lack of systematic studies. Here, we combined high-quality base-level whole genome alignments, bioinformatic analysis, and computational structure modeling to study the origination, evolution, and protein structure of lineage-specific de novo genes. We identified 555 de novo gene candidates in D. melanogaster that originated within the Drosophilinae lineage. We found a gradual shift in sequence composition, evolutionary rates, and expression patterns with their gene ages, which indicates possible gradual shifts or adaptations of their functions. Surprisingly, we found little overall protein structural changes for de novo genes in the Drosophilinae lineage. Using Alphafold2, ESMFold, and molecular dynamics, we identified a number of de novo gene candidates with protein products that are potentially well-folded, many of which are more likely to contain transmembrane and signal proteins compared to other annotated protein-coding genes. Using ancestral sequence reconstruction, we found that most potentially well-folded proteins are often born folded. Interestingly, we observed one case where disordered ancestral proteins become ordered within a relatively short evolutionary time. Single-cell RNA-seq analysis in testis showed that although most de novo genes are enriched in spermatocytes, several young de novo genes are biased in the early spermatogenesis stage, indicating potentially important but less emphasized roles of early germline cells in the de novo gene origination in testis. This study provides a systematic overview of the origin, evolution, and structural changes of Drosophilinae-specific de novo genes.
Collapse
Affiliation(s)
- Junhui Peng
- Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, New York, NY 10065, USA
| | - Li Zhao
- Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, New York, NY 10065, USA
| |
Collapse
|
194
|
Choi J. Narrow funnel-like interaction energy distribution is an indicator of specific protein interaction partner. iScience 2023; 26:106911. [PMID: 37305691 PMCID: PMC10250834 DOI: 10.1016/j.isci.2023.106911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 04/28/2023] [Accepted: 05/12/2023] [Indexed: 06/13/2023] Open
Abstract
Protein interaction networks underlie countless biological mechanisms. However, most protein interaction predictions are based on biological evidence that are biased to well-known protein interaction or physical evidence that exhibits low accuracy for weak interactions and requires high computational power. In this study, a novel method has been suggested to predict protein interaction partners by investigating narrow funnel-like interaction energy distribution. In this study, it was demonstrated that various protein interactions including kinases and E3 ubiquitin ligases have narrow funnel-like interaction energy distribution. To analyze protein interaction distribution, modified scores of iRMS and TM-score are introduced. Then, using these scores, algorithm and deep learning model for prediction of protein interaction partner and substrate of kinase and E3 ubiquitin ligase were developed. The prediction accuracy was similar to or even better than that of yeast two-hybrid screening. Ultimately, this knowledge-free protein interaction prediction method will broaden our understanding of protein interaction networks.
Collapse
Affiliation(s)
- Juyoung Choi
- Department of Life Science, Sogang University, Seoul 04017, South Korea
| |
Collapse
|
195
|
Trebesch N, Tajkhorshid E. Structure Reveals Homology in Elevator Transporters. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.14.544989. [PMID: 37398459 PMCID: PMC10312693 DOI: 10.1101/2023.06.14.544989] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
The elevator transport mechanism is one of the handful of canonical mechanisms by which transporters shuttle their substrates across the semi-permeable membranes that surround cells and organelles. Studies of molecular function are naturally guided by evolutionary context, but until now this context has been limited for elevator transporters because established evolutionary classification methods have organized them into several apparently unrelated families. Through comprehensive examination of the pertinent structures available in the Protein Data Bank, we show that 62 elevator transporters from 18 families share a conserved architecture in their transport domains consisting of 10 helices connected in 8 topologies. Through quantitative analysis of the structural similarity, structural complexity, and topologically-corrected sequence similarity among the transport domains, we provide compelling evidence that these elevator transporters are all homologous. Using our analysis, we have constructed a phylogenetic tree to enable quantification and visualization of the evolutionary relationships among elevator transporters and their families. We also report several examples of functional features that are shared by elevator transporters from different families. Our findings shed new light on the elevator transport mechanism and allow us to understand it in a far deeper and more nuanced manner.
Collapse
Affiliation(s)
- Noah Trebesch
- Theoretical and Computational Biophysics Group, NIH Resource for Macromolecular Modeling and Visualization, Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign
| | - Emad Tajkhorshid
- Theoretical and Computational Biophysics Group, NIH Resource for Macromolecular Modeling and Visualization, Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign
| |
Collapse
|
196
|
Adiyaman R, Edmunds NS, Genc AG, Alharbi SMA, McGuffin LJ. Improvement of protein tertiary and quaternary structure predictions using the ReFOLD refinement method and the AlphaFold2 recycling process. BIOINFORMATICS ADVANCES 2023; 3:vbad078. [PMID: 37359722 PMCID: PMC10290552 DOI: 10.1093/bioadv/vbad078] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Revised: 05/09/2023] [Accepted: 06/13/2023] [Indexed: 06/28/2023]
Abstract
Motivation The accuracy gap between predicted and experimental structures has been significantly reduced following the development of AlphaFold2 (AF2). However, for many targets, AF2 models still have room for improvement. In previous CASP experiments, highly computationally intensive MD simulation-based methods have been widely used to improve the accuracy of single 3D models. Here, our ReFOLD pipeline was adapted to refine AF2 predictions while maintaining high model accuracy at a modest computational cost. Furthermore, the AF2 recycling process was utilized to improve 3D models by using them as custom template inputs for tertiary and quaternary structure predictions. Results According to the Molprobity score, 94% of the generated 3D models by ReFOLD were improved. AF2 recycling showed an improvement rate of 87.5% (using MSAs) and 81.25% (using single sequences) for monomeric AF2 models and 100% (MSA) and 97.8% (single sequence) for monomeric non-AF2 models, as measured by the average change in lDDT. By the same measure, the recycling of multimeric models showed an improvement rate of as much as 80% for AF2-Multimer (AF2M) models and 94% for non-AF2M models. Availability and implementation Refinement using AlphaFold2-Multimer recycling is available as part of the MultiFOLD docker package (https://hub.docker.com/r/mcguffin/multifold). The ReFOLD server is available at https://www.reading.ac.uk/bioinf/ReFOLD/ and the modified scripts can be downloaded from https://www.reading.ac.uk/bioinf/downloads/. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Recep Adiyaman
- School of Biological Sciences, University of Reading, Reading RG6 6EX, UK
| | - Nicholas S Edmunds
- School of Biological Sciences, University of Reading, Reading RG6 6EX, UK
| | - Ahmet G Genc
- School of Biological Sciences, University of Reading, Reading RG6 6EX, UK
| | - Shuaa M A Alharbi
- School of Biological Sciences, University of Reading, Reading RG6 6EX, UK
| | | |
Collapse
|
197
|
Pogozheva ID, Cherepanov S, Park SJ, Raghavan M, Im W, Lomize AL. Structural modeling of cytokine-receptor-JAK2 signaling complexes using AlphaFold Multimer. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.14.544971. [PMID: 37398331 PMCID: PMC10312770 DOI: 10.1101/2023.06.14.544971] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
Homodimeric class 1 cytokine receptors include the erythropoietin (EPOR), thrombopoietin (TPOR), granulocyte colony-stimulating factor 3 (CSF3R), growth hormone (GHR), and prolactin receptors (PRLR). They are cell-surface single-pass transmembrane (TM) glycoproteins that regulate cell growth, proliferation, and differentiation and induce oncogenesis. An active TM signaling complex consists of a receptor homodimer, one or two ligands bound to the receptor extracellular domains and two molecules of Janus Kinase 2 (JAK2) constitutively associated with the receptor intracellular domains. Although crystal structures of soluble extracellular domains with ligands have been obtained for all the receptors except TPOR, little is known about the structure and dynamics of the complete TM complexes that activate the downstream JAK-STAT signaling pathway. Three-dimensional models of five human receptor complexes with cytokines and JAK2 were generated using AlphaFold Multimer. Given the large size of the complexes (from 3220 to 4074 residues), the modeling required a stepwise assembly from smaller parts with selection and validation of the models through comparisons with published experimental data. The modeling of active and inactive complexes supports a general activation mechanism that involves ligand binding to a monomeric receptor followed by receptor dimerization and rotational movement of the receptor TM α-helices causing proximity, dimerization, and activation of associated JAK2 subunits. The binding mode of two eltrombopag molecules to TM α-helices of the active TPOR dimer was proposed. The models also help elucidating the molecular basis of oncogenic mutations that may involve non-canonical activation route. Models equilibrated in explicit lipids of the plasma membrane are publicly available.
Collapse
Affiliation(s)
- Irina D. Pogozheva
- Department of Medicinal Chemistry, College of Pharmacy, University of Michigan, Ann Arbor, MI 48109, United States
| | | | - Sang-Jun Park
- Departments of Biological Sciences and Chemistry, Lehigh University, Bethlehem, PA 18015, United States
| | - Malini Raghavan
- Department of Microbiology and Immunology, University of Michigan Medical School, Ann Arbor, MI 48109, United States
| | - Wonpil Im
- Departments of Biological Sciences and Chemistry, Lehigh University, Bethlehem, PA 18015, United States
| | - Andrei L. Lomize
- Department of Medicinal Chemistry, College of Pharmacy, University of Michigan, Ann Arbor, MI 48109, United States
| |
Collapse
|
198
|
Kaur G, Prajapat M, Singh H, Sarma P, Bhadada SK, Shekhar N, Sharma S, Sinha S, Kumar S, Prakash A, Medhi B. Investigating the novel-binding site of RPA2 on Menin and predicting the effect of point mutation of Menin through protein-protein interactions. Sci Rep 2023; 13:9337. [PMID: 37291166 PMCID: PMC10250348 DOI: 10.1038/s41598-023-35599-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Accepted: 05/20/2023] [Indexed: 06/10/2023] Open
Abstract
Protein-protein interactions (PPIs) play a critical role in all biological processes. Menin is tumor suppressor protein, mutated in multiple endocrine neoplasia type 1 syndrome and has been shown to interact with multiple transcription factors including (RPA2) subunit of replication protein A (RPA). RPA2, heterotrimeric protein required for DNA repair, recombination and replication. However, it's still remains unclear the specific amino acid residues that have been involved in Menin-RPA2 interaction. Thus, accurately predicting the specific amino acid involved in interaction and effects of MEN1 mutations on biological systems is of great interests. The experimental approaches for identifying amino acids in menin-RPA2 interactions are expensive, time-consuming, and challenging. This study leverages computational tools, free energy decomposition and configurational entropy scheme to annotate the menin-RPA2 interaction and effect on menin point mutation, thereby proposing a viable model of menin-RPA2 interaction. The menin-RPA2 interaction pattern was calculated on the basis of different 3D structures of menin and RPA2 complexes, constructed using homology modeling and docking strategy, generating three best-fit models: Model 8 (- 74.89 kJ/mol), Model 28 (- 92.04 kJ/mol) and Model 9 (- 100.4 kJ/mol). The molecular dynamic (MD) was performed for 200 ns and binding free energies and energy decomposition analysis were calculated using Molecular Mechanics Poisson-Boltzmann Surface Area (MM/PBSA) in GROMACS. From binding free energy change, model 8 of Menin-RPA2 exhibited most negative binding energy of - 205.624 kJ/mol, followed by model 28 of Menin-RPA2 with - 177.382 kJ/mol. After S606F point mutation in Menin, increase of BFE (ΔGbind) by - 34.09 kJ/mol in Model 8 of mutant Menin-RPA2 occurs. Interestingly, we found a significant reduction of BFE (ΔGbind) and configurational entropy by - 97.54 kJ/mol and - 2618 kJ/mol in mutant model 28 as compared the o wild type. Collectively, this is the first study to highlight the configurational entropy of protein-protein interactions thereby strengthening the prediction of two significant important interaction sites in menin for the binding of RPA2. These predicted sites could be vulnerable for structural alternation in terms of binding free energy and configurational entropy after missense mutation in menin.
Collapse
Affiliation(s)
- Gurjeet Kaur
- Department of Pharmacology, Postgraduate Institute of Medical Education and Research (PGIMER), Research Block B, 4th Floor, Lab No 4044, Chandigarh, 160012, India
| | - Manisha Prajapat
- Department of Pharmacology, Postgraduate Institute of Medical Education and Research (PGIMER), Research Block B, 4th Floor, Lab No 4044, Chandigarh, 160012, India
| | - Harvinder Singh
- Department of Pharmacology, Postgraduate Institute of Medical Education and Research (PGIMER), Research Block B, 4th Floor, Lab No 4044, Chandigarh, 160012, India
| | - Phulen Sarma
- Department of Pharmacology, AIIMS, Guwahati, India
| | - Sanjay Kumar Bhadada
- Department of Endocrinology, Postgraduate Institute of Medical Education and Research, Chandigarh, India
| | - Nishant Shekhar
- Department of Pharmacology, Postgraduate Institute of Medical Education and Research (PGIMER), Research Block B, 4th Floor, Lab No 4044, Chandigarh, 160012, India
| | - Saurabh Sharma
- Department of Pharmacology, Postgraduate Institute of Medical Education and Research (PGIMER), Research Block B, 4th Floor, Lab No 4044, Chandigarh, 160012, India
| | - Shweta Sinha
- Department of Experimental Medicine and Biotechnology, PGIMER, Chandigarh, India
| | - Subodh Kumar
- Department of Pharmacology, Postgraduate Institute of Medical Education and Research (PGIMER), Research Block B, 4th Floor, Lab No 4044, Chandigarh, 160012, India
| | - Ajay Prakash
- Department of Pharmacology, Postgraduate Institute of Medical Education and Research (PGIMER), Research Block B, 4th Floor, Lab No 4044, Chandigarh, 160012, India
| | - Bikash Medhi
- Department of Pharmacology, Postgraduate Institute of Medical Education and Research (PGIMER), Research Block B, 4th Floor, Lab No 4044, Chandigarh, 160012, India.
| |
Collapse
|
199
|
Gao H, Hamp T, Ede J, Schraiber JG, McRae J, Singer-Berk M, Yang Y, Dietrich ASD, Fiziev PP, Kuderna LFK, Sundaram L, Wu Y, Adhikari A, Field Y, Chen C, Batzoglou S, Aguet F, Lemire G, Reimers R, Balick D, Janiak MC, Kuhlwilm M, Orkin JD, Manu S, Valenzuela A, Bergman J, Rousselle M, Silva FE, Agueda L, Blanc J, Gut M, de Vries D, Goodhead I, Harris RA, Raveendran M, Jensen A, Chuma IS, Horvath JE, Hvilsom C, Juan D, Frandsen P, de Melo FR, Bertuol F, Byrne H, Sampaio I, Farias I, do Amaral JV, Messias M, da Silva MNF, Trivedi M, Rossi R, Hrbek T, Andriaholinirina N, Rabarivola CJ, Zaramody A, Jolly CJ, Phillips-Conroy J, Wilkerson G, Abee C, Simmons JH, Fernandez-Duque E, Kanthaswamy S, Shiferaw F, Wu D, Zhou L, Shao Y, Zhang G, Keyyu JD, Knauf S, Le MD, Lizano E, Merker S, Navarro A, Bataillon T, Nadler T, Khor CC, Lee J, Tan P, Lim WK, Kitchener AC, Zinner D, Gut I, Melin A, Guschanski K, Schierup MH, Beck RMD, Umapathy G, Roos C, Boubli JP, Lek M, Sunyaev S, O'Donnell-Luria A, Rehm HL, Xu J, Rogers J, Marques-Bonet T, Farh KKH. The landscape of tolerated genetic variation in humans and primates. Science 2023; 380:eabn8153. [PMID: 37262156 DOI: 10.1126/science.abn8197] [Citation(s) in RCA: 52] [Impact Index Per Article: 52.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2021] [Accepted: 03/22/2023] [Indexed: 06/03/2023]
Abstract
Personalized genome sequencing has revealed millions of genetic differences between individuals, but our understanding of their clinical relevance remains largely incomplete. To systematically decipher the effects of human genetic variants, we obtained whole-genome sequencing data for 809 individuals from 233 primate species and identified 4.3 million common protein-altering variants with orthologs in humans. We show that these variants can be inferred to have nondeleterious effects in humans based on their presence at high allele frequencies in other primate populations. We use this resource to classify 6% of all possible human protein-altering variants as likely benign and impute the pathogenicity of the remaining 94% of variants with deep learning, achieving state-of-the-art accuracy for diagnosing pathogenic variants in patients with genetic diseases.
Collapse
Affiliation(s)
- Hong Gao
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA, 94404, USA
| | - Tobias Hamp
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA, 94404, USA
| | - Jeffrey Ede
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA, 94404, USA
| | - Joshua G Schraiber
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA, 94404, USA
| | - Jeremy McRae
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA, 94404, USA
| | - Moriel Singer-Berk
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Boston, MA, 02142, USA
| | - Yanshen Yang
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA, 94404, USA
| | | | - Petko P Fiziev
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA, 94404, USA
| | - Lukas F K Kuderna
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA, 94404, USA
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Laksshman Sundaram
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA, 94404, USA
| | - Yibing Wu
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA, 94404, USA
| | - Aashish Adhikari
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA, 94404, USA
| | - Yair Field
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA, 94404, USA
| | - Chen Chen
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA, 94404, USA
| | - Serafim Batzoglou
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA, 94404, USA
| | - Francois Aguet
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA, 94404, USA
| | - Gabrielle Lemire
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Boston, MA, 02142, USA
- Division of Genetics and Genomics, Department of Pediatrics, Boston Children's Hospital, Harvard Medical School, Boston, MA, 02115, USA
| | - Rebecca Reimers
- Division of Genetics and Genomics, Department of Pediatrics, Boston Children's Hospital, Harvard Medical School, Boston, MA, 02115, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115, USA
| | - Daniel Balick
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115, USA
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA
| | - Mareike C Janiak
- School of Science, Engineering & Environment, University of Salford, Salford M5 4WT, UK
| | - Martin Kuhlwilm
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, Dr. Aiguader 88, 08003 Barcelona, Spain
- Department of Evolutionary Anthropology, University of Vienna, Djerassiplatz 1, 1030 Vienna, Austria
- Human Evolution and Archaeological Sciences (HEAS), University of Vienna, 1030 Vienna, Austria
| | - Joseph D Orkin
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, Dr. Aiguader 88, 08003 Barcelona, Spain
- Département d'anthropologie, Université de Montréal, 3150 Jean-Brillant, Montréal, QC H3T 1N8, Canada
| | - Shivakumara Manu
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
- Laboratory for the Conservation of Endangered Species, CSIR-Centre for Cellular and Molecular Biology, Hyderabad 500007, India
| | - Alejandro Valenzuela
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Juraj Bergman
- Bioinformatics Research Centre, Aarhus University, Aarhus 8000, Denmark
- Section for Ecoinformatics & Biodiversity, Department of Biology, Aarhus University, 8000 Aarhus, Denmark
| | | | - Felipe Ennes Silva
- Research Group on Primate Biology and Conservation, Mamirauá Institute for Sustainable Development, Estrada da Bexiga 2584, Tefé, Amazonas, CEP 69553-225, Brazil
- Evolutionary Biology and Ecology (EBE), Département de Biologie des Organismes, Université libre de Bruxelles (ULB), Av. Franklin D. Roosevelt 50, CP 160/12, B-1050 Brussels, Belgium
| | - Lidia Agueda
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
| | - Julie Blanc
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
| | - Marta Gut
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
| | - Dorien de Vries
- School of Science, Engineering & Environment, University of Salford, Salford M5 4WT, UK
| | - Ian Goodhead
- School of Science, Engineering & Environment, University of Salford, Salford M5 4WT, UK
| | - R Alan Harris
- Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Muthuswamy Raveendran
- Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Axel Jensen
- Department of Ecology and Genetics, Animal Ecology, Uppsala University, SE-75236 Uppsala, Sweden
| | | | - Julie E Horvath
- North Carolina Museum of Natural Sciences, Raleigh, NC 27601, USA
- Department of Biological and Biomedical Sciences, North Carolina Central University, Durham, NC 27707, USA
- Department of Biological Sciences, North Carolina State University, Raleigh, NC 27695, USA
- Department of Evolutionary Anthropology, Duke University, Durham, NC 27708, USA
- Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | | | - David Juan
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, Dr. Aiguader 88, 08003 Barcelona, Spain
| | | | | | - Fabrício Bertuol
- Universidade Federal do Amazonas, Departamento de Genética, Laboratório de Evolução e Genética Animal (LEGAL), Manaus, Amazonas, 69080-900, Brazil
| | - Hazel Byrne
- Department of Anthropology, University of Utah, Salt Lake City, UT 84102, USA
| | - Iracilda Sampaio
- Universidade Federal do Para, Guamá, Belém - PA, 66075-110, Brazil
| | - Izeni Farias
- Universidade Federal do Amazonas, Departamento de Genética, Laboratório de Evolução e Genética Animal (LEGAL), Manaus, Amazonas, 69080-900, Brazil
| | - João Valsecchi do Amaral
- Research Group on Terrestrial Vertebrate Ecology, Mamirauá Institute for Sustainable Development, Tefé, Amazonas, 69553-225, Brazil
- Rede de Pesquisa para Estudos sobre Diversidade, Conservação e Uso da Fauna na Amazônia - RedeFauna, Manaus, Amazonas, 69080-900, Brazil
- Comunidad de Manejo de Fauna Silvestre en la Amazonía y en Latinoamérica - ComFauna, Iquitos, Loreto, 16001, Peru
| | - Mariluce Messias
- Universidade Federal de Rondonia, Porto Velho, Rondônia, 78900-000, Brazil
- PPGREN - Programa de Pós-Graduação "Conservação e Uso dos Recursos Naturais and BIONORTE - Programa de Pós-Graduação em Biodiversidade e Biotecnologia da Rede BIONORTE, Universidade Federal de Rondonia, Porto Velho, Rondônia, 78900-000, Brazil
| | - Maria N F da Silva
- Instituto Nacional de Pesquisas da Amazonia, Petrópolis, Manaus - AM, 69067-375, Brazil
| | - Mihir Trivedi
- Laboratory for the Conservation of Endangered Species, CSIR-Centre for Cellular and Molecular Biology, Hyderabad 500007, India
| | - Rogerio Rossi
- Universidade Federal do Mato Grosso, Boa Esperança, Cuiabá - MT, 78060-900, Brazil
| | - Tomas Hrbek
- Universidade Federal do Amazonas, Departamento de Genética, Laboratório de Evolução e Genética Animal (LEGAL), Manaus, Amazonas, 69080-900, Brazil
- Department of Biology, Trinity University, San Antonio, TX 78212, USA
| | - Nicole Andriaholinirina
- Life Sciences and Environment, Technology and Environment of Mahajanga, University of Mahajanga, Mahajanga, 401, Madagascar
| | - Clément J Rabarivola
- Life Sciences and Environment, Technology and Environment of Mahajanga, University of Mahajanga, Mahajanga, 401, Madagascar
| | - Alphonse Zaramody
- Life Sciences and Environment, Technology and Environment of Mahajanga, University of Mahajanga, Mahajanga, 401, Madagascar
| | | | | | - Gregory Wilkerson
- Keeling Center for Comparative Medicine and Research, MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Christian Abee
- Keeling Center for Comparative Medicine and Research, MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Joe H Simmons
- Keeling Center for Comparative Medicine and Research, MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Eduardo Fernandez-Duque
- Yale University, New Haven, CT 06520, USA
- Universidad Nacional de Formosa, Argentina Fundacion ECO, Formosa, Argentina
| | | | - Fekadu Shiferaw
- Guinea Worm Eradication Program, The Carter Center Ethiopia, PoB 16316, Addis Ababa 1000, Ethiopia
| | - Dongdong Wu
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650223, China
| | - Long Zhou
- Center for Evolutionary & Organismal Biology, Zhejiang University School of Medicine, Hangzhou 310058, China
| | - Yong Shao
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650223, China
| | - Guojie Zhang
- Center for Evolutionary & Organismal Biology, Zhejiang University School of Medicine, Hangzhou 310058, China
- Villum Center for Biodiversity Genomics, Section for Ecology and Evolution, Department of Biology, University of Copenhagen, DK-2100 Copenhagen, Denmark
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650223, China
- Liangzhu Laboratory, Zhejiang University Medical Center, 1369 West Wenyi Road, Hangzhou 311121, China
- Women's Hospital, School of Medicine, Zhejiang University, 1 Xueshi Road, Shangcheng District, Hangzhou 310006, China
| | - Julius D Keyyu
- Tanzania Wildlife Research Institute (TAWIRI), Head Office, P.O. Box 661, Arusha, Tanzania
| | - Sascha Knauf
- Institute of International Animal Health/One Health, Friedrich-Loeffler-Institut, Federal Research Institute for Animal Health, 17493 Greifswald - Insei Riems, Germany
| | - Minh D Le
- Department of Environmental Ecology, Faculty of Environmental Sciences, University of Science and Central Institute for Natural Resources and Environmental Studies, Vietnam National University, Hanoi 100000, Vietnam
| | - Esther Lizano
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, Dr. Aiguader 88, 08003 Barcelona, Spain
- Catalan Institution of Research and Advanced Studies (ICREA), Passeig de Lluís Companys, 23, 08010 Barcelona, Spain
| | - Stefan Merker
- Department of Zoology, State Museum of Natural History Stuttgart, 70191 Stuttgart, Germany
| | - Arcadi Navarro
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, Dr. Aiguader 88, 08003 Barcelona, Spain
- Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Edifici ICTA-ICP, c/ Columnes s/n, 08193 Cerdanyola del Vallès, Barcelona, Spain
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Av. Doctor Aiguader, N88, 08003 Barcelona, Spain
- BarcelonaBeta Brain Research Center, Pasqual Maragall Foundation, C. Wellington 30, 08005 Barcelona, Spain
| | - Thomas Bataillon
- Bioinformatics Research Centre, Aarhus University, Aarhus 8000, Denmark
| | - Tilo Nadler
- Cuc Phuong Commune, Nho Quan District, Ninh Binh Province 430000, Vietnam
| | - Chiea Chuen Khor
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Genome, Singapore 138672, Republic of Singapore
| | - Jessica Lee
- Mandai Nature, 80 Mandai Lake Road, Singapore 729826, Republic of Singapore
| | - Patrick Tan
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Genome, Singapore 138672, Republic of Singapore
- SingHealth Duke-NUS Institute of Precision Medicine (PRISM), Singapore 168582, Republic of Singapore
- Cancer and Stem Cell Biology Program, Duke-NUS Medical School, Singapore 168582, Republic of Singapore
| | - Weng Khong Lim
- SingHealth Duke-NUS Institute of Precision Medicine (PRISM), Singapore 168582, Republic of Singapore
- Cancer and Stem Cell Biology Program, Duke-NUS Medical School, Singapore 168582, Republic of Singapore
- SingHealth Duke-NUS Genomic Medicine Centre, Singapore 168582, Republic of Singapore
| | - Andrew C Kitchener
- Department of Natural Sciences, National Museums Scotland, Chambers Street, Edinburgh EH1 1JF, UK
- School of Geosciences, University of Edinburgh, Drummond Street, Edinburgh EH8 9XP, UK
| | - Dietmar Zinner
- Cognitive Ethology Laboratory, Germany Primate Center, Leibniz Institute for Primate Research, 37077 Göttingen, Germany
- Department of Primate Cognition, Georg-August-Universität Göttingen, 37077 Göttingen, Germany
- Leibniz Science Campus Primate Cognition, 37077 Göttingen, Germany
| | - Ivo Gut
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Universitat Pompeu Fabra, Pg. Luís Companys 23, 08010 Barcelona, Spain
| | - Amanda Melin
- Department of Anthropology & Archaeology, University of Calgary, 2500 University Dr NW, Calgary, AB T2N 1N4, Canada
- Department of Medical Genetics, 3330 Hospital Drive NW, HMRB 202, Calgary, AB T2N 4N1, Canada
- Alberta Children's Hospital Research Institute, University of Calgary, 2500 University Dr NW, Calgary, AB T2N 1N4, Canada
| | - Katerina Guschanski
- Department of Ecology and Genetics, Animal Ecology, Uppsala University, SE-75236 Uppsala, Sweden
- Institute of Ecology and Evolution, School of Biological Sciences, University of Edinburgh, Edinburgh EH8 9XP, UK
| | | | - Robin M D Beck
- School of Science, Engineering & Environment, University of Salford, Salford M5 4WT, UK
| | - Govindhaswamy Umapathy
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
- Laboratory for the Conservation of Endangered Species, CSIR-Centre for Cellular and Molecular Biology, Hyderabad 500007, India
| | - Christian Roos
- Gene Bank of Primates and Primate Genetics Laboratory, German Primate Center, Leibniz Institute for Primate Research, Kellnerweg 4, 37077 Göttingen, Germany
| | - Jean P Boubli
- School of Science, Engineering & Environment, University of Salford, Salford M5 4WT, UK
| | - Monkol Lek
- Department of Genetics, Yale School of Medicine, New Haven, CT 06520, USA
| | - Shamil Sunyaev
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115, USA
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA
| | - Anne O'Donnell-Luria
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Boston, MA, 02142, USA
- Division of Genetics and Genomics, Department of Pediatrics, Boston Children's Hospital, Harvard Medical School, Boston, MA, 02115, USA
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02115, USA
| | - Heidi L Rehm
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Boston, MA, 02142, USA
- Department of Genetics, Yale School of Medicine, New Haven, CT 06520, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Jinbo Xu
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA, 94404, USA
- Toyota Technological Institute at Chicago, Chicago, IL 60637, USA
| | - Jeffrey Rogers
- Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Tomas Marques-Bonet
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, Dr. Aiguader 88, 08003 Barcelona, Spain
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Catalan Institution of Research and Advanced Studies (ICREA), Passeig de Lluís Companys, 23, 08010 Barcelona, Spain
- Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Edifici ICTA-ICP, c/ Columnes s/n, 08193 Cerdanyola del Vallès, Barcelona, Spain
| | - Kyle Kai-How Farh
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA, 94404, USA
| |
Collapse
|
200
|
Chen SY, Zacharias M. What Makes a Good Protein-Protein Interaction Stabilizer: Analysis and Application of the Dual-Binding Mechanism. ACS CENTRAL SCIENCE 2023; 9:969-979. [PMID: 37252344 PMCID: PMC10214505 DOI: 10.1021/acscentsci.3c00003] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Indexed: 05/31/2023]
Abstract
Protein-protein interactions (PPIs) are essential for biological processes including immune reactions and diseases. Inhibition of PPIs by drug-like compounds is a common basis for therapeutic approaches. In many cases the flat interface of PP complexes prevents discovery of specific compound binding to cavities on one partner and PPI inhibition. However, frequently new pockets are formed at the PP interface that allow accommodation of stabilizers which is often as desirable as inhibition but a much less explored alternative strategy. Herein, we employ molecular dynamics simulations and pocket detection to investigate 18 known stabilizers and associated PP complexes. For most cases, we find that a dual-binding mechanism, a similar stabilizer interaction strength with each protein partner, is an important prerequisite for effective stabilization. A few stabilizers follow an allosteric mechanism by stabilizing the protein bound structure and/or increase the PPI indirectly. On 226 protein-protein complexes, we find in >75% of the cases interface cavities suitable for binding of drug-like compounds. We propose a computational compound identification workflow that exploits new PP interface cavities and optimizes the dual-binding mechanism and apply it to 5 PP complexes. Our study demonstrates a great potential for in silico PPI stabilizers discovery with a wide range of therapeutic applications.
Collapse
|