1
|
Yin J, Waman VP, Sen N, Firdaus-Raih M, Lam SD, Orengo C. Understanding the structural and functional diversity of ATP-PPases using protein domains and functional families in the CATH database. Structure 2025; 33:613-631.e6. [PMID: 39826548 DOI: 10.1016/j.str.2024.12.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 04/18/2024] [Accepted: 12/19/2024] [Indexed: 01/22/2025]
Abstract
ATP-pyrophosphatases (ATP-PPases) are the most primordial lineage of the large and diverse HUP (high-motif proteins, universal stress proteins, ATP-pyrophosphatase) superfamily. There are four different ATP-PPase substrate-specificity groups (SSGs), and members of each group show considerable sequence variation across the domains of life despite sharing the same catalytic function. Owing to the expansion in the number of ATP-PPase domain structures from advances in protein structure prediction by AlphaFold2 (AF2), we have characterized the two most populated ATP-PPase SSGs, the nicotinamide adenine dinucleotide synthases (NADSs) and guanosine monophosphate synthases (GMPSs). Local structural and sequence comparisons of NADS and GMPS identified taxonomic-group-specific functional motifs. As GMPS and NADS are potential drug targets of pathogenic microorganisms including Mycobacterium tuberculosis, bacterial GMPS and NADS specific functional motifs reported in this study, may contribute to antibacterial-drug development.
Collapse
Affiliation(s)
- Jialin Yin
- Department of Structural and Molecular Biology, University College London, London, UK
| | - Vaishali P Waman
- Department of Structural and Molecular Biology, University College London, London, UK
| | - Neeladri Sen
- Department of Structural and Molecular Biology, University College London, London, UK
| | - Mohd Firdaus-Raih
- Department of Applied Physics, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Malaysia
| | - Su Datt Lam
- Department of Structural and Molecular Biology, University College London, London, UK; Department of Applied Physics, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Malaysia
| | - Christine Orengo
- Department of Structural and Molecular Biology, University College London, London, UK.
| |
Collapse
|
2
|
Harris C, Kapingidza AB, San JE, Christopher J, Gavitt T, Rhodes B, Janowska K, O'Donnell C, Lindenberger J, Huang X, Sammour S, Berry M, Barr M, Parks R, Newman A, Overton M, Oguin T, Acharya P, Haynes BF, Saunders KO, Wiehe K, Azoitei ML. Design of SARS-CoV-2 RBD Immunogens to Focus Immune Responses Towards Conserved Coronavirus Epitopes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.01.09.632180. [PMID: 39829739 PMCID: PMC11741430 DOI: 10.1101/2025.01.09.632180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/22/2025]
Abstract
SARS-CoV-2 continues to evolve, with new variants emerging that evade pre-existing immunity and limit the efficacy of existing vaccines. One approach towards developing superior, variant-proof vaccines is to engineer immunogens that preferentially elicit antibodies with broad cross-reactivity against SARS-CoV-2 and its variants by targeting conserved epitopes on spike. The inner and outer faces of the Receptor Binding Domain (RBD) are two such conserved regions targeted by antibodies that recognize diverse human and animal coronaviruses. To promote the elicitation of such antibodies by vaccination, we engineered "resurfaced" RBD immunogens that contained mutations at exposed RBD residues outside the target epitopes. In the context of pre-existing immunity, these vaccine candidates aim to disfavor the elicitation of strain-specific antibodies against the immunodominant Receptor Binding Motif (RBM) while boosting the induction of inner and outer face antibodies. The engineered resurfaced RBD immunogens were stable, lacked binding to monoclonal antibodies with limited breadth, and maintained strong interactions with target broadly neutralizing antibodies. When used as vaccines, they limited humoral responses against the RBM as intended. Multimerization on nanoparticles further increased the immunogenicity of the resurfaced RBDs immunogens, thus supporting resurfacing as a promising immunogen design approach to rationally shift natural immune responses to develop more protective vaccines.
Collapse
|
3
|
Waman V, Bordin N, Lau A, Kandathil S, Wells J, Miller D, Velankar S, Jones D, Sillitoe I, Orengo C. CATH v4.4: major expansion of CATH by experimental and predicted structural data. Nucleic Acids Res 2025; 53:D348-D355. [PMID: 39565206 PMCID: PMC11701635 DOI: 10.1093/nar/gkae1087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2024] [Revised: 10/18/2024] [Accepted: 10/24/2024] [Indexed: 11/21/2024] Open
Abstract
CATH (https://www.cathdb.info) is a structural classification database that assigns domains to the structures in the Protein Data Bank (PDB) and AlphaFold Protein Structure Database (AFDB) and adds layers of biological information, including homology and functional annotation. This article covers developments in the CATH classification since 2021. We report the significant expansion of structural information (180-fold) for CATH superfamilies through classification of PDB domains and predicted domain structures from the Encyclopedia of Domains (TED) resource. TED provides information on predicted domains in AFDB. CATH v4.4 represents an expansion of ∼64 844 experimentally determined domain structures from PDB. We also present a mapping of ∼90 million predicted domains from TED to CATH superfamilies. New PDB and TED data increases the number of superfamilies from 5841 to 6573, folds from 1349 to 2078 and architectures from 41 to 77. TED data comprises predicted structures, so these new folds and architectures remain hypothetical until experimentally confirmed. CATH also classifies domains into functional families (FunFams) within a superfamily. We have updated sequences in FunFams by scanning FunFam-HMMs against UniProt release 2024_02, giving a 276% increase in FunFams coverage. The mapping of TED structural domains has resulted in a 4-fold increase in FunFams with structural information.
Collapse
Affiliation(s)
- Vaishali P Waman
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Nicola Bordin
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Andy Lau
- Department of Computer Science, University College London, London WC1E 6BT, UK
- InstaDeep Ltd, 5 Merchant Square, London W2 1AY, UK
| | - Shaun Kandathil
- Department of Computer Science, University College London, London WC1E 6BT, UK
| | - Jude Wells
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
- Centre for Artificial Intelligence, University College London, London WC1V 6BH, UK
| | - David Miller
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
- Centre for Artificial Intelligence, University College London, London WC1V 6BH, UK
| | - Sameer Velankar
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD, UK
| | - David T Jones
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
- Department of Computer Science, University College London, London WC1E 6BT, UK
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| |
Collapse
|
4
|
McCoy CJ, Wray CP, Freeman L, Crooks BA, Golinelli L, Marks NJ, Temmerman L, Beets I, Atkinson LE, Mousley A. Exploitation of phylum-spanning omics resources reveals complexity in the nematode FLP signalling system and provides insights into flp-gene evolution. BMC Genomics 2024; 25:1220. [PMID: 39702046 DOI: 10.1186/s12864-024-11111-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2024] [Accepted: 12/02/2024] [Indexed: 12/21/2024] Open
Abstract
BACKGROUND Parasitic nematodes significantly undermine global human and animal health and productivity. Parasite control is reliant on anthelmintic administration however over-use of a limited number of drugs has resulted in escalating parasitic nematode resistance, threatening the sustainability of parasite control and underscoring an urgent need for the development of novel therapeutics. FMRFamide-like peptides (FLPs), the largest family of nematode neuropeptides, modulate nematode behaviours including those important for parasite survival, highlighting FLP receptors (FLP-GPCRs) as appealing putative novel anthelmintic targets. Advances in omics resources have enabled the identification of FLPs and neuropeptide-GPCRs in some parasitic nematodes, but remaining gaps in FLP-ligand libraries hinder the characterisation of receptor-ligand interactions, which are required to drive the development of novel control approaches. RESULTS In this study we exploited recent expansions in nematode genome data to identify 2143 flp-genes in > 100 nematode species across free-living, entomopathogenic, plant, and animal parasitic lifestyles and representing 7 of the 12 major nematode clades. Our data reveal that: (i) the phylum-spanning flps, flp-1, -8, -14, and - 18, may be representative of the flp profile of the last common ancestor of nematodes; (ii) the majority of parasitic nematodes have a reduced flp complement relative to free-living species; (iii) FLP prepropeptide architecture is variable within and between flp-genes and across nematode species; (iv) FLP prepropeptide signatures facilitate flp-gene discrimination; (v) FLP motifs display variable length, amino acid sequence, and conservation; (vi) CLANS analysis provides insight into the evolutionary history of flp-gene sequelogues and reveals putative flp-gene paralogues and, (vii) flp expression is upregulated in the infective larval stage of several nematode parasites. CONCLUSIONS These data provide the foundation required for phylum-spanning FLP-GPCR deorphanisation screens in nematodes to seed the discovery and development of novel parasite control approaches.
Collapse
Affiliation(s)
- Ciaran J McCoy
- School of Biological Sciences, Queen's University Belfast, 19 Chlorine Gardens, Belfast, BT9 5DL, UK
- Animal Physiology and Neurobiology, Department of Biology, University of Leuven (KU Leuven), Naamsestraat 59, Leuven, 3000, Belgium
| | - Christopher P Wray
- School of Biological Sciences, Queen's University Belfast, 19 Chlorine Gardens, Belfast, BT9 5DL, UK
| | - Laura Freeman
- School of Biological Sciences, Queen's University Belfast, 19 Chlorine Gardens, Belfast, BT9 5DL, UK
| | - Bethany A Crooks
- School of Biological Sciences, Queen's University Belfast, 19 Chlorine Gardens, Belfast, BT9 5DL, UK
| | - Luca Golinelli
- Animal Physiology and Neurobiology, Department of Biology, University of Leuven (KU Leuven), Naamsestraat 59, Leuven, 3000, Belgium
| | - Nikki J Marks
- School of Biological Sciences, Queen's University Belfast, 19 Chlorine Gardens, Belfast, BT9 5DL, UK
| | - Liesbet Temmerman
- Animal Physiology and Neurobiology, Department of Biology, University of Leuven (KU Leuven), Naamsestraat 59, Leuven, 3000, Belgium
| | - Isabel Beets
- Animal Physiology and Neurobiology, Department of Biology, University of Leuven (KU Leuven), Naamsestraat 59, Leuven, 3000, Belgium
| | - Louise E Atkinson
- School of Biological Sciences, Queen's University Belfast, 19 Chlorine Gardens, Belfast, BT9 5DL, UK
| | - Angela Mousley
- School of Biological Sciences, Queen's University Belfast, 19 Chlorine Gardens, Belfast, BT9 5DL, UK.
| |
Collapse
|
5
|
Deneke VE, Blaha A, Lu Y, Suwita JP, Draper JM, Phan CS, Panser K, Schleiffer A, Jacob L, Humer T, Stejskal K, Krssakova G, Roitinger E, Handler D, Kamoshita M, Vance TDR, Wang X, Surm JM, Moran Y, Lee JE, Ikawa M, Pauli A. A conserved fertilization complex bridges sperm and egg in vertebrates. Cell 2024; 187:7066-7078.e22. [PMID: 39423812 DOI: 10.1016/j.cell.2024.09.035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 07/25/2024] [Accepted: 09/19/2024] [Indexed: 10/21/2024]
Abstract
Fertilization, the basis for sexual reproduction, culminates in the binding and fusion of sperm and egg. Although several proteins are known to be crucial for this process in vertebrates, the molecular mechanisms remain poorly understood. Using an AlphaFold-Multimer screen, we identified the protein Tmem81 as part of a conserved trimeric sperm complex with the essential fertilization factors Izumo1 and Spaca6. We demonstrate that Tmem81 is essential for male fertility in zebrafish and mice. In line with trimer formation, we show that Izumo1, Spaca6, and Tmem81 interact in zebrafish sperm and that the human orthologs interact in vitro. Notably, complex formation creates the binding site for the egg fertilization factor Bouncer in zebrafish. Together, our work presents a comprehensive model for fertilization across vertebrates, where a conserved sperm complex binds to divergent egg proteins-Bouncer in fish and JUNO in mammals-to mediate sperm-egg interaction.
Collapse
Affiliation(s)
- Victoria E Deneke
- Research Institute of Molecular Pathology (IMP), Vienna BioCenter (VBC), 1030 Vienna, Austria.
| | - Andreas Blaha
- Research Institute of Molecular Pathology (IMP), Vienna BioCenter (VBC), 1030 Vienna, Austria; Vienna BioCenter PhD Program, Doctoral School of the University of Vienna and Medical University of Vienna, Vienna, Austria
| | - Yonggang Lu
- Premium Research Institute for Human Metaverse Medicine (WPI-PRIMe), Osaka University, Osaka 565-0871, Japan; Department of Experimental Genome Research, Research Institute for Microbial Diseases, Osaka University, Osaka 565-0871, Japan
| | - Johannes P Suwita
- Research Institute of Molecular Pathology (IMP), Vienna BioCenter (VBC), 1030 Vienna, Austria; Vienna BioCenter PhD Program, Doctoral School of the University of Vienna and Medical University of Vienna, Vienna, Austria
| | - Jonne M Draper
- Research Institute of Molecular Pathology (IMP), Vienna BioCenter (VBC), 1030 Vienna, Austria
| | - Clara S Phan
- Research Institute of Molecular Pathology (IMP), Vienna BioCenter (VBC), 1030 Vienna, Austria
| | - Karin Panser
- Research Institute of Molecular Pathology (IMP), Vienna BioCenter (VBC), 1030 Vienna, Austria
| | - Alexander Schleiffer
- Research Institute of Molecular Pathology (IMP), Vienna BioCenter (VBC), 1030 Vienna, Austria
| | - Laurine Jacob
- Research Institute of Molecular Pathology (IMP), Vienna BioCenter (VBC), 1030 Vienna, Austria
| | - Theresa Humer
- Research Institute of Molecular Pathology (IMP), Vienna BioCenter (VBC), 1030 Vienna, Austria; Vienna BioCenter PhD Program, Doctoral School of the University of Vienna and Medical University of Vienna, Vienna, Austria
| | - Karel Stejskal
- Institute of Molecular Biotechnology of the Austrian Academy of Sciences (IMBA), Vienna BioCenter (VBC), 1030 Vienna, Austria
| | - Gabriela Krssakova
- Research Institute of Molecular Pathology (IMP), Vienna BioCenter (VBC), 1030 Vienna, Austria
| | - Elisabeth Roitinger
- Institute of Molecular Biotechnology of the Austrian Academy of Sciences (IMBA), Vienna BioCenter (VBC), 1030 Vienna, Austria
| | - Dominik Handler
- Institute of Molecular Biotechnology of the Austrian Academy of Sciences (IMBA), Vienna BioCenter (VBC), 1030 Vienna, Austria
| | - Maki Kamoshita
- Department of Experimental Genome Research, Research Institute for Microbial Diseases, Osaka University, Osaka 565-0871, Japan
| | - Tyler D R Vance
- Department of Laboratory Medicine and Pathobiology, Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada
| | - Xinyin Wang
- Department of Laboratory Medicine and Pathobiology, Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada
| | - Joachim M Surm
- Department of Ecology, Evolution and Behavior, Alexander Silberman Institute of Life Sciences, Faculty of Science, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Yehu Moran
- Department of Ecology, Evolution and Behavior, Alexander Silberman Institute of Life Sciences, Faculty of Science, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Jeffrey E Lee
- Department of Laboratory Medicine and Pathobiology, Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada
| | - Masahito Ikawa
- Department of Experimental Genome Research, Research Institute for Microbial Diseases, Osaka University, Osaka 565-0871, Japan; Laboratory of Reproductive Systems Biology, Institute of Medical Science, The University of Tokyo, Tokyo 108-8639, Japan
| | - Andrea Pauli
- Research Institute of Molecular Pathology (IMP), Vienna BioCenter (VBC), 1030 Vienna, Austria.
| |
Collapse
|
6
|
Jagadeesh J, Vembar SS. Evolution of sequence, structural and functional diversity of the ubiquitous DNA/RNA-binding Alba domain. Sci Rep 2024; 14:30363. [PMID: 39638848 PMCID: PMC11621453 DOI: 10.1038/s41598-024-79937-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Accepted: 11/13/2024] [Indexed: 12/07/2024] Open
Abstract
The DNA/RNA-binding Alba domain is prevalent across all kingdoms of life. First discovered in archaea, this protein domain has evolved from RNA- to DNA-binding, with a concomitant expansion in the range of cellular processes that it regulates. Despite its widespread presence, the full extent of its sequence, structural, and functional diversity remains unexplored. In this study, we employed iterative searches in PSI-BLAST to identify 15,161 unique Alba domain-containing proteins from the NCBI non-redundant protein database. Sequence similarity network (SSN) analysis clustered them into 13 distinct subgroups, including the archaeal Alba and eukaryotic Rpp20/Pop7 and Rpp25/Pop6 groups, as well as novel fungal and Plasmodium-specific Albas. Sequence and structural conservation analysis of the subgroups indicated high preservation of the dimer interface, with Alba domains from unicellular eukaryotes notably exhibiting structural deviations towards their C-terminal end. Finally, phylogenetic analysis, while supporting SSN clustering, revealed the evolutionary branchpoint at which the eukaryotic Rpp20- and Rpp25-like clades emerged from archaeal Albas, and the subsequent taxonomic lineage-based divergence within each clade. Taken together, this comprehensive analysis enhances our understanding of the evolutionary history of Alba domain-containing proteins across diverse organisms.
Collapse
Affiliation(s)
- Jaiganesh Jagadeesh
- Institute of Bioinformatics and Applied Biotechnology, Bengaluru, Karnataka, India
| | | |
Collapse
|
7
|
Lau AM, Bordin N, Kandathil SM, Sillitoe I, Waman VP, Wells J, Orengo CA, Jones DT. Exploring structural diversity across the protein universe with The Encyclopedia of Domains. Science 2024; 386:eadq4946. [PMID: 39480926 DOI: 10.1126/science.adq4946] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2024] [Accepted: 08/30/2024] [Indexed: 11/02/2024]
Abstract
The AlphaFold Protein Structure Database (AFDB) contains more than 214 million predicted protein structures composed of domains, which are independently folding units found in multiple structural and functional contexts. Identifying domains can enable many functional and evolutionary analyses but has remained challenging because of the sheer scale of the data. Using deep learning methods, we have detected and classified every domain in the AFDB, producing The Encyclopedia of Domains. We detected nearly 365 million domains, over 100 million more than can be found by sequence methods, covering more than 1 million taxa. Reassuringly, 77% of the nonredundant domains are similar to known superfamilies, greatly expanding representation of their domain space. We uncovered more than 10,000 new structural interactions between superfamilies and thousands of new folds across the fold space continuum.
Collapse
Affiliation(s)
- Andy M Lau
- Department of Computer Science, University College London, London WC1E 6BT, UK
| | - Nicola Bordin
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Shaun M Kandathil
- Department of Computer Science, University College London, London WC1E 6BT, UK
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Vaishali P Waman
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Jude Wells
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
- Centre for Artificial Intelligence, University College London, London WC1V 6BH, UK
| | - Christine A Orengo
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - David T Jones
- Department of Computer Science, University College London, London WC1E 6BT, UK
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| |
Collapse
|
8
|
Tants JN, Oberstrass L, Weigand JE, Schlundt A. Structure and RNA-binding of the helically extended Roquin CCCH-type zinc finger. Nucleic Acids Res 2024; 52:9838-9853. [PMID: 38953172 PMCID: PMC11381341 DOI: 10.1093/nar/gkae555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 06/07/2024] [Accepted: 06/17/2024] [Indexed: 07/03/2024] Open
Abstract
Zinc finger (ZnF) domains appear in a pool of structural contexts and despite their small size achieve varying target specificities, covering single-stranded and double-stranded DNA and RNA as well as proteins. Combined with other RNA-binding domains, ZnFs enhance affinity and specificity of RNA-binding proteins (RBPs). The ZnF-containing immunoregulatory RBP Roquin initiates mRNA decay, thereby controlling the adaptive immune system. Its unique ROQ domain shape-specifically recognizes stem-looped cis-elements in mRNA 3'-untranslated regions (UTR). The N-terminus of Roquin contains a RING domain for protein-protein interactions and a ZnF, which was suggested to play an essential role in RNA decay by Roquin. The ZnF domain boundaries, its RNA motif preference and its interplay with the ROQ domain have remained elusive, also driven by the lack of high-resolution data of the challenging protein. We provide the solution structure of the Roquin-1 ZnF and use an RBNS-NMR pipeline to show that the ZnF recognizes AU-rich RNAs. We systematically refine the contributions of adenines in a poly(U)-background to specific complex formation. With the simultaneous binding of ROQ and ZnF to a natural target transcript of Roquin, our study for the first time suggests how Roquin integrates RNA shape and sequence features through the ROQ-ZnF tandem.
Collapse
Affiliation(s)
- Jan-Niklas Tants
- Institute for Molecular Biosciences and Biomolecular Resonance Center (BMRZ), Goethe University Frankfurt, Max-von-Laue-Str. 7-9, 60438 Frankfurt, Germany
| | - Lasse Oberstrass
- University of Marburg, Department of Pharmacy, Institute of Pharmaceutical Chemistry, Marbacher Weg 6, 35037 Marburg, Germany
| | - Julia E Weigand
- University of Marburg, Department of Pharmacy, Institute of Pharmaceutical Chemistry, Marbacher Weg 6, 35037 Marburg, Germany
| | - Andreas Schlundt
- Institute for Molecular Biosciences and Biomolecular Resonance Center (BMRZ), Goethe University Frankfurt, Max-von-Laue-Str. 7-9, 60438 Frankfurt, Germany
- University of Greifswald, Institute of Biochemistry, Felix-Hausdorff-Str. 4, 17489 Greifswald, Germany
| |
Collapse
|
9
|
Bordin N, Scholes H, Rauer C, Roca-Martínez J, Sillitoe I, Orengo C. Clustering protein functional families at large scale with hierarchical approaches. Protein Sci 2024; 33:e5140. [PMID: 39145441 PMCID: PMC11325189 DOI: 10.1002/pro.5140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Revised: 07/22/2024] [Accepted: 07/24/2024] [Indexed: 08/16/2024]
Abstract
Proteins, fundamental to cellular activities, reveal their function and evolution through their structure and sequence. CATH functional families (FunFams) are coherent clusters of protein domain sequences in which the function is conserved across their members. The increasing volume and complexity of protein data enabled by large-scale repositories like MGnify or AlphaFold Database requires more powerful approaches that can scale to the size of these new resources. In this work, we introduce MARC and FRAN, two algorithms developed to build upon and address limitations of GeMMA/FunFHMMER, our original methods developed to classify proteins with related functions using a hierarchical approach. We also present CATH-eMMA, which uses embeddings or Foldseek distances to form relationship trees from distance matrices, reducing computational demands and handling various data types effectively. CATH-eMMA offers a highly robust and much faster tool for clustering protein functions on a large scale, providing a new tool for future studies in protein function and evolution.
Collapse
Affiliation(s)
- Nicola Bordin
- Institute of Structural and Molecular Biology, University College London, London, UK
| | - Harry Scholes
- Institute of Structural and Molecular Biology, University College London, London, UK
| | - Clemens Rauer
- Institute of Structural and Molecular Biology, University College London, London, UK
- Universidad Autonoma de Madrid, Ciudad Universitaria de Cantoblanco, Madrid, Spain
| | - Joel Roca-Martínez
- Institute of Structural and Molecular Biology, University College London, London, UK
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, University College London, London, UK
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College London, London, UK
| |
Collapse
|
10
|
Waman VP, Bordin N, Alcraft R, Vickerstaff R, Rauer C, Chan Q, Sillitoe I, Yamamori H, Orengo C. CATH 2024: CATH-AlphaFlow Doubles the Number of Structures in CATH and Reveals Nearly 200 New Folds. J Mol Biol 2024; 436:168551. [PMID: 38548261 DOI: 10.1016/j.jmb.2024.168551] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 03/20/2024] [Accepted: 03/22/2024] [Indexed: 04/07/2024]
Abstract
CATH (https://www.cathdb.info) classifies domain structures from experimental protein structures in the PDB and predicted structures in the AlphaFold Database (AFDB). To cope with the scale of the predicted data a new NextFlow workflow (CATH-AlphaFlow), has been developed to classify high-quality domains into CATH superfamilies and identify novel fold groups and superfamilies. CATH-AlphaFlow uses a novel state-of-the-art structure-based domain boundary prediction method (ChainSaw) for identifying domains in multi-domain proteins. We applied CATH-AlphaFlow to process PDB structures not classified in CATH and AFDB structures from 21 model organisms, expanding CATH by over 100%. Domains not classified in existing CATH superfamilies or fold groups were used to seed novel folds, giving 253 new folds from PDB structures (September 2023 release) and 96 from AFDB structures of proteomes of 21 model organisms. Where possible, functional annotations were obtained using (i) predictions from publicly available methods (ii) annotations from structural relatives in AFDB/UniProt50. We also predicted functional sites and highly conserved residues. Some folds are associated with important functions such as photosynthetic acclimation (in flowering plants), iron permease activity (in fungi) and post-natal spermatogenesis (in mice). CATH-AlphaFlow will allow us to identify many more CATH relatives in the AFDB, further characterising the protein structure landscape.
Collapse
Affiliation(s)
- Vaishali P Waman
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom
| | - Nicola Bordin
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom
| | - Rachel Alcraft
- Advanced Research Computing Centre, University College London, London, United Kingdom
| | - Robert Vickerstaff
- Advanced Research Computing Centre, University College London, London, United Kingdom
| | - Clemens Rauer
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom
| | - Qian Chan
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom
| | - Hazuki Yamamori
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom.
| |
Collapse
|
11
|
Waman VP, Ashford P, Lam SD, Sen N, Abbasian M, Woodridge L, Goldtzvik Y, Bordin N, Wu J, Sillitoe I, Orengo CA. Predicting human and viral protein variants affecting COVID-19 susceptibility and repurposing therapeutics. Sci Rep 2024; 14:14208. [PMID: 38902252 PMCID: PMC11190248 DOI: 10.1038/s41598-024-61541-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Accepted: 05/07/2024] [Indexed: 06/22/2024] Open
Abstract
The COVID-19 disease is an ongoing global health concern. Although vaccination provides some protection, people are still susceptible to re-infection. Ostensibly, certain populations or clinical groups may be more vulnerable. Factors causing these differences are unclear and whilst socioeconomic and cultural differences are likely to be important, human genetic factors could influence susceptibility. Experimental studies indicate SARS-CoV-2 uses innate immune suppression as a strategy to speed-up entry and replication into the host cell. Therefore, it is necessary to understand the impact of variants in immunity-associated human proteins on susceptibility to COVID-19. In this work, we analysed missense coding variants in several SARS-CoV-2 proteins and their human protein interactors that could enhance binding affinity to SARS-CoV-2. We curated a dataset of 19 SARS-CoV-2: human protein 3D-complexes, from the experimentally determined structures in the Protein Data Bank and models built using AlphaFold2-multimer, and analysed the impact of missense variants occurring in the protein-protein interface region. We analysed 468 missense variants from human proteins and 212 variants from SARS-CoV-2 proteins and computationally predicted their impacts on binding affinities for the human viral protein complexes. We predicted a total of 26 affinity-enhancing variants from 13 human proteins implicated in increased binding affinity to SARS-CoV-2. These include key-immunity associated genes (TOMM70, ISG15, IFIH1, IFIT2, RPS3, PALS1, NUP98, AXL, ARF6, TRIMM, TRIM25) as well as important spike receptors (KREMEN1, AXL and ACE2). We report both common (e.g., Y13N in IFIH1) and rare variants in these proteins and discuss their likely structural and functional impact, using information on known and predicted functional sites. Potential mechanisms associated with immune suppression implicated by these variants are discussed. Occurrence of certain predicted affinity-enhancing variants should be monitored as they could lead to increased susceptibility and reduced immune response to SARS-CoV-2 infection in individuals/populations carrying them. Our analyses aid in understanding the potential impact of genetic variation in immunity-associated proteins on COVID-19 susceptibility and help guide drug-repurposing strategies.
Collapse
Affiliation(s)
- Vaishali P Waman
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Paul Ashford
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Su Datt Lam
- Department of Applied Physics, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, Bangi, Malaysia
| | - Neeladri Sen
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Mahnaz Abbasian
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Laurel Woodridge
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Yonathan Goldtzvik
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Nicola Bordin
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Jiaxin Wu
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Christine A Orengo
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK.
| |
Collapse
|
12
|
Nikam R, Jemimah S, Gromiha MM. DeepPPAPredMut: deep ensemble method for predicting the binding affinity change in protein-protein complexes upon mutation. Bioinformatics 2024; 40:btae309. [PMID: 38718170 PMCID: PMC11112046 DOI: 10.1093/bioinformatics/btae309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 04/08/2024] [Accepted: 05/08/2024] [Indexed: 05/24/2024] Open
Abstract
MOTIVATION Protein-protein interactions underpin many cellular processes and their disruption due to mutations can lead to diseases. With the evolution of protein structure prediction methods like AlphaFold2 and the availability of extensive experimental affinity data, there is a pressing need for updated computational tools that can efficiently predict changes in binding affinity caused by mutations in protein-protein complexes. RESULTS We developed a deep ensemble model that leverages protein sequences, predicted structure-based features, and protein functional classes to accurately predict the change in binding affinity due to mutations. The model achieved a correlation of 0.97 and a mean absolute error (MAE) of 0.35 kcal/mol on the training dataset, and maintained robust performance on the test set with a correlation of 0.72 and a MAE of 0.83 kcal/mol. Further validation using Leave-One-Out Complex (LOOC) cross-validation exhibited a correlation of 0.83 and a MAE of 0.51 kcal/mol, indicating consistent performance. AVAILABILITY AND IMPLEMENTATION https://web.iitm.ac.in/bioinfo2/DeepPPAPredMut/index.html.
Collapse
Affiliation(s)
- Rahul Nikam
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India
| | - Sherlyn Jemimah
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India
- Department of Biomedical Engineering, Khalifa University, P.O. Box: 127788 , Abu Dhabi, United Arab Emirates
| | - M Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India
- Department of Computer Science, Tokyo Tech World Research Hub Initiative (WRHI), Institute of Innovative Research, Tokyo Institute of Technology, 4259 Nagatsutacho, Midori-ku, Yokohama, Kanagawa 226-8501, Japan
| |
Collapse
|
13
|
Zhong G, Zhao Y, Zhuang D, Chung WK, Shen Y. PreMode predicts mode-of-action of missense variants by deep graph representation learning of protein sequence and structural context. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.20.581321. [PMID: 38746140 PMCID: PMC11092447 DOI: 10.1101/2024.02.20.581321] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Accurate prediction of the functional impact of missense variants is important for disease gene discovery, clinical genetic diagnostics, therapeutic strategies, and protein engineering. Previous efforts have focused on predicting a binary pathogenicity classification, but the functional impact of missense variants is multi-dimensional. Pathogenic missense variants in the same gene may act through different modes of action (i.e., gain/loss-of-function) by affecting different aspects of protein function. They may result in distinct clinical conditions that require different treatments. We developed a new method, PreMode, to perform gene-specific mode-of-action predictions. PreMode models effects of coding sequence variants using SE(3)-equivariant graph neural networks on protein sequences and structures. Using the largest-to-date set of missense variants with known modes of action, we showed that PreMode reached state-of-the-art performance in multiple types of mode-of-action predictions by efficient transfer-learning. Additionally, PreMode's prediction of G/LoF variants in a kinase is consistent with inactive-active conformation transition energy changes. Finally, we show that PreMode enables efficient study design of deep mutational scans and optimization in protein engineering.
Collapse
|
14
|
MacGowan SA, Madeira F, Britto-Borges T, Barton GJ. A unified analysis of evolutionary and population constraint in protein domains highlights structural features and pathogenic sites. Commun Biol 2024; 7:447. [PMID: 38605212 PMCID: PMC11009406 DOI: 10.1038/s42003-024-06117-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 03/27/2024] [Indexed: 04/13/2024] Open
Abstract
Protein evolution is constrained by structure and function, creating patterns in residue conservation that are routinely exploited to predict structure and other features. Similar constraints should affect variation across individuals, but it is only with the growth of human population sequencing that this has been tested at scale. Now, human population constraint has established applications in pathogenicity prediction, but it has not yet been explored for structural inference. Here, we map 2.4 million population variants to 5885 protein families and quantify residue-level constraint with a new Missense Enrichment Score (MES). Analysis of 61,214 structures from the PDB spanning 3661 families shows that missense depleted sites are enriched in buried residues or those involved in small-molecule or protein binding. MES is complementary to evolutionary conservation and a combined analysis allows a new classification of residues according to a conservation plane. This approach finds functional residues that are evolutionarily diverse, which can be related to specificity, as well as family-wide conserved sites that are critical for folding or function. We also find a possible contrast between lethal and non-lethal pathogenic sites, and a surprising clinical variant hot spot at a subset of missense enriched positions.
Collapse
Affiliation(s)
- Stuart A MacGowan
- Division of Computational Biology School of Life Sciences University of Dundee, Dow Street Dundee, DD1 5EH, Scotland, UK
| | - Fábio Madeira
- Division of Computational Biology School of Life Sciences University of Dundee, Dow Street Dundee, DD1 5EH, Scotland, UK
- European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Thiago Britto-Borges
- Division of Computational Biology School of Life Sciences University of Dundee, Dow Street Dundee, DD1 5EH, Scotland, UK
- Section of Bioinformatics and Systems Cardiology, Department of Internal Medicine III and Klaus Tschira Institute for Integrative Computational Cardiology, Heidelberg University Hospital, Heidelberg, Germany
| | - Geoffrey J Barton
- Division of Computational Biology School of Life Sciences University of Dundee, Dow Street Dundee, DD1 5EH, Scotland, UK.
| |
Collapse
|
15
|
Chikunova A, Manley MP, Heijjer CN, Drenth CS, Cramer-Blok AJ, Ahmad MUD, Perrakis A, Ubbink M. Conserved proline residues prevent dimerization and aggregation in the β-lactamase BlaC. Protein Sci 2024; 33:e4972. [PMID: 38533527 DOI: 10.1002/pro.4972] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Revised: 03/07/2024] [Accepted: 03/11/2024] [Indexed: 03/28/2024]
Abstract
Evolution leads to conservation of amino acid residues in protein families. Conserved proline residues are usually considered to ensure the correct folding and to stabilize the three-dimensional structure. Surprisingly, proline residues that are highly conserved in class A β-lactamases were found to tolerate various substitutions without large losses in enzyme activity. We investigated the roles of three conserved prolines at positions 107, 226, and 258 in the β-lactamase BlaC from Mycobacterium tuberculosis and found that mutations can lead to dimerization of the enzyme and an overall less stable protein that is prone to aggregate over time. For the variant Pro107Thr, the crystal structure shows dimer formation resembling domain swapping. It is concluded that the proline substitutions loosen the structure, enhancing multimerization. Even though the enzyme does not lose its properties without the conserved proline residues, the prolines ensure the long-term structural integrity of the enzyme.
Collapse
Affiliation(s)
- A Chikunova
- Leiden Institute of Chemistry, Leiden University, Leiden, The Netherlands
| | - M P Manley
- Leiden Institute of Chemistry, Leiden University, Leiden, The Netherlands
| | - C N Heijjer
- Leiden Institute of Chemistry, Leiden University, Leiden, The Netherlands
| | - C S Drenth
- Leiden Institute of Chemistry, Leiden University, Leiden, The Netherlands
| | - A J Cramer-Blok
- Leiden Institute of Chemistry, Leiden University, Leiden, The Netherlands
| | - M Ud Din Ahmad
- Division of Biochemistry, The Netherlands Cancer Institute, Amsterdam, The Netherlands
- Oncode Institute, Division of Biochemistry, The Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - A Perrakis
- Division of Biochemistry, The Netherlands Cancer Institute, Amsterdam, The Netherlands
- Oncode Institute, Division of Biochemistry, The Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - M Ubbink
- Leiden Institute of Chemistry, Leiden University, Leiden, The Netherlands
- Department of Infectious Diseases, Imperial College, London, UK
- Zocdoc, New York City, New York, USA
- ZoBio BV, Leiden, The Netherlands
| |
Collapse
|
16
|
Rocha A, Nguyen QAT, Haga-Yamanaka S. Type 2 vomeronasal receptor-A4 subfamily: Potential predator sensors in mice. Genesis 2024; 62:e23597. [PMID: 38590121 PMCID: PMC11018355 DOI: 10.1002/dvg.23597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Revised: 03/15/2024] [Accepted: 03/27/2024] [Indexed: 04/10/2024]
Abstract
Sensory signals detected by olfactory sensory organs are critical regulators of animal behavior. An accessory olfactory organ, the vomeronasal organ, detects cues from other animals and plays a pivotal role in intra- and inter-species interactions in mice. However, how ethologically relevant cues control mouse behavior through approximately 350 vomeronasal sensory receptor proteins largely remains elusive. The type 2 vomeronasal receptor-A4 (V2R-A4) subfamily members have been repeatedly detected from vomeronasal sensory neurons responsive to predator cues, suggesting a potential role of this receptor subfamily as a sensor for predators. This review focuses on this intriguing subfamily, delving into its receptor functions and genetic characteristics.
Collapse
Affiliation(s)
- Andrea Rocha
- Neuroscience Graduate Program, University of California, Riverside, Riverside, California, USA
| | - Quynh Anh Thi Nguyen
- Neuroscience Graduate Program, University of California, Riverside, Riverside, California, USA
| | - Sachiko Haga-Yamanaka
- Neuroscience Graduate Program, University of California, Riverside, Riverside, California, USA
- Department of Molecular, Cell and Systems Biology, University of California, Riverside, Riverside, California, USA
| |
Collapse
|
17
|
Cannariato M, Zizzi EA, Pallante L, Miceli M, Deriu MA. Mechanical communication within the microtubule through network-based analysis of tubulin dynamics. Biomech Model Mechanobiol 2024; 23:569-579. [PMID: 38060156 PMCID: PMC10963519 DOI: 10.1007/s10237-023-01792-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Accepted: 11/11/2023] [Indexed: 12/08/2023]
Abstract
The identification of the mechanisms underlying the transfer of mechanical vibrations in protein complexes is crucial to understand how these super-assemblies are stabilized to perform specific functions within the cell. In this context, the study of the structural communication and the propagation of mechanical stimuli within the microtubule (MT) is important given the pivotal role of the latter in cell viability. In this study, we employed molecular modelling and the dynamical network analysis approaches to analyse the MT. The results highlight that β -tubulin drives the transfer of mechanical information between protofilaments (PFs), which is altered at the seam due to a different interaction pattern. Moreover, while the key residues involved in the structural communication along the PF are generally conserved, a higher diversity was observed for amino acids mediating the lateral communication. Taken together, these results might explain why MTs with different PF numbers are formed in different organisms or with different β -tubulin isotypes.
Collapse
Affiliation(s)
- Marco Cannariato
- PolitoBIOMed Lab, Department of Mechanical and Aerospace Engineering, Politecnico di Torino, Turin, Italy
| | - Eric A Zizzi
- PolitoBIOMed Lab, Department of Mechanical and Aerospace Engineering, Politecnico di Torino, Turin, Italy
| | - Lorenzo Pallante
- PolitoBIOMed Lab, Department of Mechanical and Aerospace Engineering, Politecnico di Torino, Turin, Italy
| | - Marcello Miceli
- PolitoBIOMed Lab, Department of Mechanical and Aerospace Engineering, Politecnico di Torino, Turin, Italy
| | - Marco A Deriu
- PolitoBIOMed Lab, Department of Mechanical and Aerospace Engineering, Politecnico di Torino, Turin, Italy.
| |
Collapse
|
18
|
Blake KS, Kumar H, Loganathan A, Williford EE, Diorio-Toth L, Xue YP, Tang WK, Campbell TP, Chong DD, Angtuaco S, Wencewicz TA, Tolia NH, Dantas G. Sequence-structure-function characterization of the emerging tetracycline destructase family of antibiotic resistance enzymes. Commun Biol 2024; 7:336. [PMID: 38493211 PMCID: PMC10944477 DOI: 10.1038/s42003-024-06023-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Accepted: 03/07/2024] [Indexed: 03/18/2024] Open
Abstract
Tetracycline destructases (TDases) are flavin monooxygenases which can confer resistance to all generations of tetracycline antibiotics. The recent increase in the number and diversity of reported TDase sequences enables a deep investigation of the TDase sequence-structure-function landscape. Here, we evaluate the sequence determinants of TDase function through two complementary approaches: (1) constructing profile hidden Markov models to predict new TDases, and (2) using multiple sequence alignments to identify conserved positions important to protein function. Using the HMM-based approach we screened 50 high-scoring candidate sequences in Escherichia coli, leading to the discovery of 13 new TDases. The X-ray crystal structures of two new enzymes from Legionella species were determined, and the ability of anhydrotetracycline to inhibit their tetracycline-inactivating activity was confirmed. Using the MSA-based approach we identified 31 amino acid positions 100% conserved across all known TDase sequences. The roles of these positions were analyzed by alanine-scanning mutagenesis in two TDases, to study the impact on cell and in vitro activity, structure, and stability. These results expand the diversity of TDase sequences and provide valuable insights into the roles of important residues in TDases, and flavin monooxygenases more broadly.
Collapse
Affiliation(s)
- Kevin S Blake
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO, USA
| | - Hirdesh Kumar
- Host-Pathogen Interactions and Structural Vaccinology section (HPISV), National Institute of Allergy and Infectious Diseases (NIAID), National Institutes of Health (NIH), Bethesda, MD, USA
| | - Anisha Loganathan
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO, USA
| | - Emily E Williford
- Department of Chemistry, Washington University in St. Louis, St. Louis, MO, USA
| | - Luke Diorio-Toth
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO, USA
| | - Yao-Peng Xue
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO, USA
| | - Wai Kwan Tang
- Host-Pathogen Interactions and Structural Vaccinology section (HPISV), National Institute of Allergy and Infectious Diseases (NIAID), National Institutes of Health (NIH), Bethesda, MD, USA
| | - Tayte P Campbell
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO, USA
| | - David D Chong
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO, USA
| | - Steven Angtuaco
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO, USA
| | - Timothy A Wencewicz
- Department of Chemistry, Washington University in St. Louis, St. Louis, MO, USA.
| | - Niraj H Tolia
- Host-Pathogen Interactions and Structural Vaccinology section (HPISV), National Institute of Allergy and Infectious Diseases (NIAID), National Institutes of Health (NIH), Bethesda, MD, USA.
| | - Gautam Dantas
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO, USA.
- Department of Pathology and Immunology, Division of Laboratory and Genomic Medicine, Washington University School of Medicine, St. Louis, MO, USA.
- Department of Molecular Microbiology, Washington University School of Medicine, St. Louis, MO, USA.
- Department of Biomedical Engineering, Washington University School of Medicine, St. Louis, MO, USA.
- Department of Pediatrics, Washington University School of Medicine, St. Louis, MO, USA.
| |
Collapse
|
19
|
Hernández-Prieto JH, Martini VP, Iulek J. Structure of glyceraldehyde-3-phosphate dehydrogenase from Paracoccidioides lutzii in complex with an aldonic sugar acid. Biochimie 2024; 218:20-33. [PMID: 37709188 DOI: 10.1016/j.biochi.2023.09.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2023] [Revised: 07/23/2023] [Accepted: 09/11/2023] [Indexed: 09/16/2023]
Abstract
The pathogen Paracoccidioides lutzii (Pb01) is found in South America countries Colombia, Ecuador, Venezuela and Brazil, especially in the central, west, and north regions of the latter. It belongs to the Ajellomycetaceae family, Onygenales order, and is typically thermodimorphic, presenting yeast cells when it grows in animal tissues, but mycelia when in the environment, where it produces the infectious propagule. This fungus is one of the etiologic agents of Paracoccidioidomycosis (PCM), the most important endemic fungal infection in Latin America. Investigations on its genome have contributed to a better understanding about its metabolism and revealed the complexity of several metabolic glycolytic pathways. Glyceraldehyde-3-Phosphate Dehydrogenase from Paracoccidioides lutzii (PlGAPDH) is considered a moonlighting protein and participates in several biological processes of this pathogen. The enzyme was expressed and purified, as seen in SDS-PAGE gel, crystallized and had its three dimensional structure (3D) determined in complex with NAD+, a sulphate ion and d-galactonic acid, therefore, a type of 'GAA site'. It is the first GAPDH structure to show this chemical type in this site and how this protein can bind an acid derived from oxidation of a linear hexose.
Collapse
Affiliation(s)
| | | | - Jorge Iulek
- Department of Chemistry, State University of Ponta Grossa, Ponta Grossa, PR, 84030-900, Brazil.
| |
Collapse
|
20
|
Khowal S, Zhang D, Yong WH, Heaney AP. Whole-exome sequencing reveals genetic variants that may play a role in neurocytomas. J Neurooncol 2024; 166:471-483. [PMID: 38319496 DOI: 10.1007/s11060-024-04567-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2023] [Accepted: 01/09/2024] [Indexed: 02/07/2024]
Abstract
OBJECTIVES Neurocytomas (NCs) are rare intracranial tumors that can often be surgically resected. However, disease course is unpredictable in many patients and medical therapies are lacking. We have used whole exome sequencing to explore the molecular etiology for neurocytoma and assist in target identification to develop novel therapeutic interventions. METHODS We used whole exome sequencing (WES) to compare the molecular landscape of 21 primary & recurrent NCs to five normal cerebellar control samples. WES data was analyzed using the Qiagen Clinical Insight program, variants of interest (VOI) were interrogated using ConSurf, ScoreCons, & Ingenuity Pathway Analysis Software to predict their potential functional effects, and Copy number variations (CNVs) in the genes of interest were analyzed by Genewiz (Azenta Life Sciences). RESULTS Of 40 VOI involving thirty-six genes, 7 were pathogenic, 17 likely-pathogenic, and 16 of uncertain-significance. Of seven pathogenic NC associated variants, Glucosylceramidase beta 1 [GBA1 c.703T > C (p.S235P)] was mutated in 5/21 (24%), Coagulation factor VIII [F8 c.3637dupA (p.I1213fs*28)] in 4/21 (19%), Phenylalanine hydroxylase [PAH c.975C > A (p.Y325*)] in 3/21 (14%), and Fanconi anemia complementation group C [FANCC c.1162G > T (p.G388*)], Chromodomain helicase DNA binding protein 7 [CHD7 c.2839C > T (p.R947*)], Myosin VIIA [MYO7A c.940G > T (p.E314*)] and Dynein axonemal heavy chain 11 [DNAH11 c.3544C > T (p.R1182*)] in 2/21 (9.5%) NCs respectively. CNVs were noted in 85% of these latter 7 genes. Interestingly, a Carboxy-terminal domain RNA polymerase II polypeptide A small phosphatase 2 [CTDSP2 c.472G > A (p.E158K)] of uncertain significance was also found in > 70% of NC cases. INTERPRETATION The variants of interest we identified in the NCs regulate a variety of neurological processes including cilia motility, cell metabolism, immune responses, and DNA damage repair and provide novel insights into the molecular pathogenesis of these extremely rare tumors.
Collapse
Affiliation(s)
- Sapna Khowal
- Department of Medicine, David Geffen School of Medicine, University of California, Los Angeles, CA, 90095, USA
| | - Dongyun Zhang
- Department of Medicine, David Geffen School of Medicine, University of California, Los Angeles, CA, 90095, USA
| | - William H Yong
- Department of Pathology and Laboratory Medicine, University of California, Irvine, CA, 92868, USA
| | - Anthony P Heaney
- Department of Medicine, David Geffen School of Medicine, University of California, Los Angeles, CA, 90095, USA.
- Department of Neurosurgery, David Geffen School of Medicine, University of California, Los Angeles, CA, 90095, USA.
| |
Collapse
|
21
|
Pandey M, Shah SK, Gromiha MM. Computational approaches for identifying disease-causing mutations in proteins. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2023; 139:141-171. [PMID: 38448134 DOI: 10.1016/bs.apcsb.2023.11.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/08/2024]
Abstract
Advancements in genome sequencing have expanded the scope of investigating mutations in proteins across different diseases. Amino acid mutations in a protein alter its structure, stability and function and some of them lead to diseases. Identification of disease-causing mutations is a challenging task and it will be helpful for designing therapeutic strategies. Hence, mutation data available in the literature have been curated and stored in several databases, which have been effectively utilized for developing computational methods to identify deleterious mutations (drivers), using sequence and structure-based properties of proteins. In this chapter, we describe the contents of specific databases that have information on disease-causing and neutral mutations followed by sequence and structure-based properties. Further, characteristic features of disease-causing mutations will be discussed along with computational methods for identifying cancer hotspot residues and disease-causing mutations in proteins.
Collapse
Affiliation(s)
- Medha Pandey
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, India
| | - Suraj Kumar Shah
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, India
| | - M Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, India; International Research Frontiers Initiative, School of Computing, Tokyo Institute of Technology, Yokohama, Japan.
| |
Collapse
|
22
|
Radford EJ, Tan HK, Andersson MHL, Stephenson JD, Gardner EJ, Ironfield H, Waters AJ, Gitterman D, Lindsay S, Abascal F, Martincorena I, Kolesnik-Taylor A, Ng-Cordell E, Firth HV, Baker K, Perry JRB, Adams DJ, Gerety SS, Hurles ME. Saturation genome editing of DDX3X clarifies pathogenicity of germline and somatic variation. Nat Commun 2023; 14:7702. [PMID: 38057330 PMCID: PMC10700591 DOI: 10.1038/s41467-023-43041-4] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Accepted: 10/30/2023] [Indexed: 12/08/2023] Open
Abstract
Loss-of-function of DDX3X is a leading cause of neurodevelopmental disorders (NDD) in females. DDX3X is also a somatically mutated cancer driver gene proposed to have tumour promoting and suppressing effects. We perform saturation genome editing of DDX3X, testing in vitro the functional impact of 12,776 nucleotide variants. We identify 3432 functionally abnormal variants, in three distinct classes. We train a machine learning classifier to identify functionally abnormal variants of NDD-relevance. This classifier has at least 97% sensitivity and 99% specificity to detect variants pathogenic for NDD, substantially out-performing in silico predictors, and resolving up to 93% of variants of uncertain significance. Moreover, functionally-abnormal variants can account for almost all of the excess nonsynonymous DDX3X somatic mutations seen in DDX3X-driven cancers. Systematic maps of variant effects generated in experimentally tractable cell types have the potential to transform clinical interpretation of both germline and somatic disease-associated variation.
Collapse
Affiliation(s)
- Elizabeth J Radford
- Wellcome Sanger Institute, Hinxton, CB10 1SA, UK
- Department of Paediatrics, University of Cambridge, Level 8, Cambridge Biomedical Campus, Cambridge, CB2 0QQ, UK
| | - Hong-Kee Tan
- Wellcome Sanger Institute, Hinxton, CB10 1SA, UK
| | | | | | - Eugene J Gardner
- MRC Epidemiology Unit, University of Cambridge School of Clinical Medicine, Cambridge Biomedical Campus, Cambridge, CB2 0QQ, UK
| | | | | | | | | | | | | | | | - Elise Ng-Cordell
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK
- Department of Psychology, University of British Columbia, Vancouver, Canada
| | - Helen V Firth
- Wellcome Sanger Institute, Hinxton, CB10 1SA, UK
- Department of Medical Genetics, University of Cambridge, Cambridge, UK
| | - Kate Baker
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK
- Department of Medical Genetics, University of Cambridge, Cambridge, UK
| | - John R B Perry
- MRC Epidemiology Unit, University of Cambridge School of Clinical Medicine, Cambridge Biomedical Campus, Cambridge, CB2 0QQ, UK
| | | | | | | |
Collapse
|
23
|
Baltrusaitis EE, Ravitch EE, Fenton AR, Perez TA, Holzbaur ELF, Dominguez R. Interaction between the mitochondrial adaptor MIRO and the motor adaptor TRAK. J Biol Chem 2023; 299:105441. [PMID: 37949220 PMCID: PMC10746525 DOI: 10.1016/j.jbc.2023.105441] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Accepted: 10/16/2023] [Indexed: 11/12/2023] Open
Abstract
MIRO (mitochondrial Rho GTPase) consists of two GTPase domains flanking two Ca2+-binding EF-hand domains. A C-terminal transmembrane helix anchors MIRO to the outer mitochondrial membrane, where it functions as a general adaptor for the recruitment of cytoskeletal proteins that control mitochondrial dynamics. One protein recruited by MIRO is TRAK (trafficking kinesin-binding protein), which in turn recruits the microtubule-based motors kinesin-1 and dynein-dynactin. The mechanism by which MIRO interacts with TRAK is not well understood. Here, we map and quantitatively characterize the interaction of human MIRO1 and TRAK1 and test its potential regulation by Ca2+ and/or GTP binding. TRAK1 binds MIRO1 with low micromolar affinity. The interaction was mapped to a fragment comprising MIRO1's EF-hands and C-terminal GTPase domain and to a conserved sequence motif within TRAK1 residues 394 to 431, immediately C-terminal to the Spindly motif. This sequence is sufficient for MIRO1 binding in vitro and is necessary for MIRO1-dependent localization of TRAK1 to mitochondria in cells. MIRO1's EF-hands bind Ca2+ with dissociation constants (KD) of 3.9 μM and 300 nM. This suggests that under cellular conditions one EF-hand may be constitutively bound to Ca2+ whereas the other EF-hand binds Ca2+ in a regulated manner, depending on its local concentration. Yet, the MIRO1-TRAK1 interaction is independent of Ca2+ binding to the EF-hands and of the nucleotide state (GDP or GTP) of the C-terminal GTPase. The interaction is also independent of TRAK1 dimerization, such that a TRAK1 dimer can be expected to bind two MIRO1 molecules on the mitochondrial surface.
Collapse
Affiliation(s)
- Elana E Baltrusaitis
- Department of Physiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA; Biochemistry and Molecular Biophysics Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Erika E Ravitch
- Department of Physiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Adam R Fenton
- Department of Physiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA; Cell and Molecular Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, USA
| | - Tania A Perez
- Department of Physiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA; Cell and Molecular Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, USA
| | - Erika L F Holzbaur
- Department of Physiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA; Biochemistry and Molecular Biophysics Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA; Cell and Molecular Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, USA
| | - Roberto Dominguez
- Department of Physiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA; Biochemistry and Molecular Biophysics Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA.
| |
Collapse
|
24
|
Popov P, Kalinin R, Buslaev P, Kozlovskii I, Zaretckii M, Karlov D, Gabibov A, Stepanov A. Unraveling viral drug targets: a deep learning-based approach for the identification of potential binding sites. Brief Bioinform 2023; 25:bbad459. [PMID: 38113077 PMCID: PMC10783863 DOI: 10.1093/bib/bbad459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 11/10/2023] [Accepted: 11/22/2023] [Indexed: 12/21/2023] Open
Abstract
The coronavirus disease 2019 (COVID-19) pandemic has spurred a wide range of approaches to control and combat the disease. However, selecting an effective antiviral drug target remains a time-consuming challenge. Computational methods offer a promising solution by efficiently reducing the number of candidates. In this study, we propose a structure- and deep learning-based approach that identifies vulnerable regions in viral proteins corresponding to drug binding sites. Our approach takes into account the protein dynamics, accessibility and mutability of the binding site and the putative mechanism of action of the drug. We applied this technique to validate drug targeting toward severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spike glycoprotein S. Our findings reveal a conformation- and oligomer-specific glycan-free binding site proximal to the receptor binding domain. This site comprises topologically important amino acid residues. Molecular dynamics simulations of Spike in complex with candidate drug molecules bound to the potential binding sites indicate an equilibrium shifted toward the inactive conformation compared with drug-free simulations. Small molecules targeting this binding site have the potential to prevent the closed-to-open conformational transition of Spike, thereby allosterically inhibiting its interaction with human angiotensin-converting enzyme 2 receptor. Using a pseudotyped virus-based assay with a SARS-CoV-2 neutralizing antibody, we identified a set of hit compounds that exhibited inhibition at micromolar concentrations.
Collapse
Affiliation(s)
- Petr Popov
- Tetra-d, Rheinweg 9, Schaffhausen, 8200, Switzerland
- School of Science, Constructor University Bremen gGmbH, 28759, Bremen, Germany
| | - Roman Kalinin
- M.M. Shemyakin and Yu.A. Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, Moscow, 117997, Russia
| | - Pavel Buslaev
- Nanoscience Center and Department of Chemistry, University of Jyväskylä, 40014, Jyväskylä, Finland
| | - Igor Kozlovskii
- Tetra-d, Rheinweg 9, Schaffhausen, 8200, Switzerland
- School of Science, Constructor University Bremen gGmbH, 28759, Bremen, Germany
| | - Mark Zaretckii
- Tetra-d, Rheinweg 9, Schaffhausen, 8200, Switzerland
- School of Science, Constructor University Bremen gGmbH, 28759, Bremen, Germany
| | - Dmitry Karlov
- School of Pharmacy, Medical Biology Centre, Queen’s University Belfast, Street, Belfast, BT9 7BL Northern Ireland, U.K
| | - Alexander Gabibov
- M.M. Shemyakin and Yu.A. Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, Moscow, 117997, Russia
| | - Alexey Stepanov
- Department of Chemistry, The Scripps Research Institute, 10550 North Torrey Pines Road MB-10, La Jolla, 92037, CA, USA
| |
Collapse
|
25
|
Nikam R, Yugandhar K, Gromiha MM. DeepBSRPred: deep learning-based binding site residue prediction for proteins. Amino Acids 2023; 55:1305-1316. [PMID: 36574037 DOI: 10.1007/s00726-022-03228-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Accepted: 12/15/2022] [Indexed: 12/28/2022]
Abstract
MOTIVATION Proteins-protein interactions (PPIs) are important to govern several cellular activities. Amino acid residues, which are located at the interface are known as the binding sites and the information about binding sites helps to understand the binding affinities and functions of protein-protein complexes. RESULTS We have developed a deep neural network-based method, DeepBSRPred, for predicting the binding sites using protein sequence information and predicted structures from AlphaFold2. Specific sequence and structure-based features include position-specific scoring matrix (PSSM), solvent accessible surface area, conservation score and amino acid properties, and residue depth, respectively. Our method predicted the binding sites with an average F1 score of 0.73 in a dataset of 1236 proteins. Further, we compared the performance with other existing methods in the literature using four benchmark datasets and our method outperformed those methods. AVAILABILITY AND IMPLEMENTATION The DeepBSRPred web server can be found at https://web.iitm.ac.in/bioinfo2/deepbsrpred/index.html , along with all datasets used in this study. The trained models, the DeepBSRPred standalone source code, and the feature computation pipeline are freely available at https://web.iitm.ac.in/bioinfo2/deepbsrpred/download.html .
Collapse
Affiliation(s)
- Rahul Nikam
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu, 600036, India
| | - Kumar Yugandhar
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu, 600036, India
- Department of Computational Biology, Cornell University, New York, NY, USA
| | - M Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu, 600036, India.
- Department of Computer Science, Tokyo Institute of Technology, Yokohama, Japan.
| |
Collapse
|
26
|
Plaza DF, Zerebinski J, Broumou I, Lautenbach MJ, Ngasala B, Sundling C, Färnert A. A genomic platform for surveillance and antigen discovery in Plasmodium spp. using long-read amplicon sequencing. CELL REPORTS METHODS 2023; 3:100574. [PMID: 37751696 PMCID: PMC10545912 DOI: 10.1016/j.crmeth.2023.100574] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 06/18/2023] [Accepted: 08/07/2023] [Indexed: 09/28/2023]
Abstract
Many vaccine candidate proteins in the malaria parasite Plasmodium falciparum are under strong immunological pressure and confer antigenic diversity. We present a sequencing and data analysis platform for the genomic surveillance of the insertion or deletion (indel)-rich antigens merozoite surface protein 1 (MSP1), MSP2, glutamate-rich protein (GLURP), and CSP from P. falciparum using long-read circular consensus sequencing (CCS) in multiclonal malaria isolates. Our platform uses 40 PCR primers per gene to asymmetrically barcode and identify multiclonal infections in pools of up to 384 samples. With msp2, we validated the method using 235 mock infections combining 10 synthetic variants at different concentrations and infection complexities. We applied this strategy to P. falciparum isolates from a longitudinal cohort in Tanzania. Finally, we constructed an analysis pipeline that streamlines the processing and interpretation of epidemiological and antigenic diversity data from demultiplexed FASTQ files. This platform can be easily adapted to other polymorphic antigens of interest in Plasmodium or any other human pathogen.
Collapse
Affiliation(s)
- David Fernando Plaza
- Division of Infectious Diseases, Department of Medicine Solna and Center for Molecular Medicine, Karolinska Institutet, 17177 Stockholm, Sweden; Department of Infectious Diseases, Karolinska University Hospital, 17176 Stockholm, Sweden.
| | - Julia Zerebinski
- Division of Infectious Diseases, Department of Medicine Solna and Center for Molecular Medicine, Karolinska Institutet, 17177 Stockholm, Sweden; Department of Infectious Diseases, Karolinska University Hospital, 17176 Stockholm, Sweden
| | - Ioanna Broumou
- Division of Infectious Diseases, Department of Medicine Solna and Center for Molecular Medicine, Karolinska Institutet, 17177 Stockholm, Sweden; Department of Infectious Diseases, Karolinska University Hospital, 17176 Stockholm, Sweden
| | - Maximilian Julius Lautenbach
- Division of Infectious Diseases, Department of Medicine Solna and Center for Molecular Medicine, Karolinska Institutet, 17177 Stockholm, Sweden; Department of Infectious Diseases, Karolinska University Hospital, 17176 Stockholm, Sweden
| | - Billy Ngasala
- Muhimbili University of Health and Allied Sciences, Dar es Salaam 57RF+V8, Tanzania
| | - Christopher Sundling
- Division of Infectious Diseases, Department of Medicine Solna and Center for Molecular Medicine, Karolinska Institutet, 17177 Stockholm, Sweden; Department of Infectious Diseases, Karolinska University Hospital, 17176 Stockholm, Sweden
| | - Anna Färnert
- Division of Infectious Diseases, Department of Medicine Solna and Center for Molecular Medicine, Karolinska Institutet, 17177 Stockholm, Sweden; Department of Infectious Diseases, Karolinska University Hospital, 17176 Stockholm, Sweden
| |
Collapse
|
27
|
Seifert-Davila W, Girbig M, Hauptmann L, Hoffmann T, Eustermann S, Müller CW. Structural insights into human TFIIIC promoter recognition. SCIENCE ADVANCES 2023; 9:eadh2019. [PMID: 37418517 PMCID: PMC11811891 DOI: 10.1126/sciadv.adh2019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Accepted: 06/02/2023] [Indexed: 07/09/2023]
Abstract
Transcription factor (TF) IIIC recruits RNA polymerase (Pol) III to most of its target genes. Recognition of intragenic A- and B-box motifs in transfer RNA (tRNA) genes by TFIIIC modules τA and τB is the first critical step for tRNA synthesis but is mechanistically poorly understood. Here, we report cryo-electron microscopy structures of the six-subunit human TFIIIC complex unbound and bound to a tRNA gene. The τB module recognizes the B-box via DNA shape and sequence readout through the assembly of multiple winged-helix domains. TFIIIC220 forms an integral part of both τA and τB connecting the two subcomplexes via a ~550-amino acid residue flexible linker. Our data provide a structural mechanism by which high-affinity B-box recognition anchors TFIIIC to promoter DNA and permits scanning for low-affinity A-boxes and TFIIIB for Pol III activation.
Collapse
Affiliation(s)
- Wolfram Seifert-Davila
- Structural and Computational Biology Unit, European Molecular Biology Laboratory (EMBL), Meyerhofstrasse 1, 69117 Heidelberg, Germany
- Candidate for joint PhD degree from EMBL and Faculty of Biosciences, Heidelberg University, 69120 Heidelberg, Germany
| | - Mathias Girbig
- Structural and Computational Biology Unit, European Molecular Biology Laboratory (EMBL), Meyerhofstrasse 1, 69117 Heidelberg, Germany
| | - Luis Hauptmann
- Structural and Computational Biology Unit, European Molecular Biology Laboratory (EMBL), Meyerhofstrasse 1, 69117 Heidelberg, Germany
| | - Thomas Hoffmann
- Structural and Computational Biology Unit, European Molecular Biology Laboratory (EMBL), Meyerhofstrasse 1, 69117 Heidelberg, Germany
| | - Sebastian Eustermann
- Structural and Computational Biology Unit, European Molecular Biology Laboratory (EMBL), Meyerhofstrasse 1, 69117 Heidelberg, Germany
| | - Christoph W. Müller
- Structural and Computational Biology Unit, European Molecular Biology Laboratory (EMBL), Meyerhofstrasse 1, 69117 Heidelberg, Germany
| |
Collapse
|
28
|
La Sala G, Pfleger C, Käck H, Wissler L, Nevin P, Böhm K, Janet JP, Schimpl M, Stubbs CJ, De Vivo M, Tyrchan C, Hogner A, Gohlke H, Frolov AI. Combining structural and coevolution information to unveil allosteric sites. Chem Sci 2023; 14:7057-7067. [PMID: 37389247 PMCID: PMC10306073 DOI: 10.1039/d2sc06272k] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 06/02/2023] [Indexed: 07/01/2023] Open
Abstract
Understanding allosteric regulation in biomolecules is of great interest to pharmaceutical research and computational methods emerged during the last decades to characterize allosteric coupling. However, the prediction of allosteric sites in a protein structure remains a challenging task. Here, we integrate local binding site information, coevolutionary information, and information on dynamic allostery into a structure-based three-parameter model to identify potentially hidden allosteric sites in ensembles of protein structures with orthosteric ligands. When tested on five allosteric proteins (LFA-1, p38-α, GR, MAT2A, and BCKDK), the model successfully ranked all known allosteric pockets in the top three positions. Finally, we identified a novel druggable site in MAT2A confirmed by X-ray crystallography and SPR and a hitherto unknown druggable allosteric site in BCKDK validated by biochemical and X-ray crystallography analyses. Our model can be applied in drug discovery to identify allosteric pockets.
Collapse
Affiliation(s)
- Giuseppina La Sala
- Medicinal Chemistry, Research and Early Development, Cardiovascular, Renal and Metabolism (CVRM), BioPharmaceuticals R&D, AstraZeneca Gothenburg Sweden
| | - Christopher Pfleger
- Mathematisch-Naturwissenschaftliche Fakultät, Institut für Pharmazeutische und Medizinische Chemie, Heinrich-Heine-Universität Düsseldorf 40225 Düsseldorf Germany
| | - Helena Käck
- Mechanistic and Structural Biology, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca Gothenburg Sweden
| | - Lisa Wissler
- Mechanistic and Structural Biology, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca Gothenburg Sweden
| | - Philip Nevin
- Discovery Biology, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca Gothenburg Sweden
| | - Kerstin Böhm
- Discovery Biology, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca Gothenburg Sweden
| | - Jon Paul Janet
- Medicinal Chemistry, Research and Early Development, Cardiovascular, Renal and Metabolism (CVRM), BioPharmaceuticals R&D, AstraZeneca Gothenburg Sweden
| | - Marianne Schimpl
- Mechanistic and Structural Biology, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca Cambridge UK
| | - Christopher J Stubbs
- Mechanistic and Structural Biology, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca Cambridge UK
| | - Marco De Vivo
- Laboratory of Molecular Modeling and Drug Design, Istituto Italiano di Tecnologia Via Morego 30 16163 Genoa Italy
| | - Christian Tyrchan
- Medicinal Chemistry, Research and Early Development, Respiratory & Immunology (R&I), BioPharmaceuticals R&D, AstraZeneca Gothenburg Sweden
| | - Anders Hogner
- Medicinal Chemistry, Research and Early Development, Cardiovascular, Renal and Metabolism (CVRM), BioPharmaceuticals R&D, AstraZeneca Gothenburg Sweden
| | - Holger Gohlke
- Mathematisch-Naturwissenschaftliche Fakultät, Institut für Pharmazeutische und Medizinische Chemie, Heinrich-Heine-Universität Düsseldorf 40225 Düsseldorf Germany
- John von Neumann Institute for Computing (NIC), Jülich Supercomputing Centre (JSC), Institute of Biological Information Processing (IBI-7: Structural Biochemistry), Institute of Bio- and Geosciences (IBG-4: Bioinformatics) Forschungszentrum Jülich GmbH 52425 Jülich Germany
| | - Andrey I Frolov
- Medicinal Chemistry, Research and Early Development, Cardiovascular, Renal and Metabolism (CVRM), BioPharmaceuticals R&D, AstraZeneca Gothenburg Sweden
| |
Collapse
|
29
|
Kibby EM, Conte AN, Burroughs AM, Nagy TA, Vargas JA, Whalen LA, Aravind L, Whiteley AT. Bacterial NLR-related proteins protect against phage. Cell 2023; 186:2410-2424.e18. [PMID: 37160116 PMCID: PMC10294775 DOI: 10.1016/j.cell.2023.04.015] [Citation(s) in RCA: 44] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Revised: 11/15/2022] [Accepted: 04/07/2023] [Indexed: 05/11/2023]
Abstract
Bacteria use a wide range of immune pathways to counter phage infection. A subset of these genes shares homology with components of eukaryotic immune systems, suggesting that eukaryotes horizontally acquired certain innate immune genes from bacteria. Here, we show that proteins containing a NACHT module, the central feature of the animal nucleotide-binding domain and leucine-rich repeat containing gene family (NLRs), are found in bacteria and defend against phages. NACHT proteins are widespread in bacteria, provide immunity against both DNA and RNA phages, and display the characteristic C-terminal sensor, central NACHT, and N-terminal effector modules. Some bacterial NACHT proteins have domain architectures similar to the human NLRs that are critical components of inflammasomes. Human disease-associated NLR mutations that cause stimulus-independent activation of the inflammasome also activate bacterial NACHT proteins, supporting a shared signaling mechanism. This work establishes that NACHT module-containing proteins are ancient mediators of innate immunity across the tree of life.
Collapse
Affiliation(s)
- Emily M Kibby
- Department of Biochemistry, University of Colorado Boulder, Boulder, CO 80303, USA
| | - Amy N Conte
- Department of Biochemistry, University of Colorado Boulder, Boulder, CO 80303, USA
| | - A Maxwell Burroughs
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Toni A Nagy
- Department of Biochemistry, University of Colorado Boulder, Boulder, CO 80303, USA
| | - Jose A Vargas
- Department of Biochemistry, University of Colorado Boulder, Boulder, CO 80303, USA
| | - Lindsay A Whalen
- Department of Biochemistry, University of Colorado Boulder, Boulder, CO 80303, USA
| | - L Aravind
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Aaron T Whiteley
- Department of Biochemistry, University of Colorado Boulder, Boulder, CO 80303, USA.
| |
Collapse
|
30
|
Son S, Kim B, Yang J, Kim VN. Role of the proline-rich disordered domain of DROSHA in intronic microRNA processing. Genes Dev 2023; 37:383-397. [PMID: 37236670 PMCID: PMC10270192 DOI: 10.1101/gad.350275.122] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Accepted: 04/24/2023] [Indexed: 05/28/2023]
Abstract
DROSHA serves as a gatekeeper of the microRNA (miRNA) pathway by processing primary transcripts (pri-miRNAs). While the functions of structured domains of DROSHA have been well documented, the contribution of N-terminal proline-rich disordered domain (PRD) remains elusive. Here we show that the PRD promotes the processing of miRNA hairpins located within introns. We identified a DROSHA isoform (p140) lacking the PRD, which is produced by proteolytic cleavage. Small RNA sequencing revealed that p140 is significantly impaired in the maturation of intronic miRNAs. Consistently, our minigene constructs demonstrated that PRD enhances the processing of intronic hairpins, but not those in exons. Splice site mutations did not affect the PRD's enhancing effect on intronic constructs, suggesting that the PRD acts independently of splicing reaction by interacting with sequences residing within introns. The N-terminal regions from zebrafish and Xenopus DROSHA can replace the human counterpart, indicating functional conservation despite poor sequence alignment. Moreover, we found that rapidly evolving intronic miRNAs are generally more dependent on PRD than conserved ones, suggesting a role of PRD in miRNA evolution. Our study reveals a new layer of miRNA regulation mediated by a low-complexity disordered domain that senses the genomic contexts of miRNA loci.
Collapse
Affiliation(s)
- Soomin Son
- Center for RNA Research, Institute for Basic Science, Seoul 08826, Korea
- School of Biological Sciences, Seoul National University, Seoul 08826, Korea
| | - Baekgyu Kim
- Center for RNA Research, Institute for Basic Science, Seoul 08826, Korea
- School of Biological Sciences, Seoul National University, Seoul 08826, Korea
| | - Jihye Yang
- Center for RNA Research, Institute for Basic Science, Seoul 08826, Korea
- School of Biological Sciences, Seoul National University, Seoul 08826, Korea
| | - V Narry Kim
- Center for RNA Research, Institute for Basic Science, Seoul 08826, Korea;
- School of Biological Sciences, Seoul National University, Seoul 08826, Korea
| |
Collapse
|
31
|
Pandey M, Gromiha MM. MutBLESS: A tool to identify disease-prone sites in cancer using deep learning. Biochim Biophys Acta Mol Basis Dis 2023; 1869:166721. [PMID: 37105446 DOI: 10.1016/j.bbadis.2023.166721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 04/07/2023] [Accepted: 04/12/2023] [Indexed: 04/29/2023]
Abstract
Understanding the molecular basis and impact of mutations at different stages of cancer are long-standing challenges in cancer biology. Identification of driver mutations from experiments is expensive and time intensive. In the present study, we collected the data for experimentally known driver mutations in 22 different cancer types and classified them into six categories: breast cancer (BRCA), acute myeloid leukaemia (LAML), endometrial carcinoma (EC), stomach cancer (STAD), skin cancer (SKCM), and other cancer types which contains 5747 disease prone and 5514 neutral sites in 516 proteins. The analysis of amino acid distribution along mutant sites revealed that the motifs AAA and LR are preferred in disease-prone sites whereas QPP and QF are dominant in neutral sites. Further, we developed a method using deep neural networks to predict disease-prone sites with amino acid sequence-based features such as physicochemical properties, secondary structure, tri-peptide motifs and conservation scores. We obtained an average AUC of 0.97 in five cancer types BRCA, LAML, EC, STAD and SKCM in a test dataset and 0.72 in all other cancer types together. Our method showed excellent performance for identifying cancer-specific mutations with an average sensitivity, specificity, and accuracy of 96.56 %, 97.39 %, and 97.64 %, respectively. We developed a web server for identifying cancer-prone sites, and it is available at https://web.iitm.ac.in/bioinfo2/MutBLESS/index.html. We suggest that our method can serve as an effective method to identify disease-prone sites and assist to develop therapeutic strategies.
Collapse
Affiliation(s)
- Medha Pandey
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India
| | - M Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India.
| |
Collapse
|
32
|
Baltoumas FA, Karatzas E, Paez-Espino D, Venetsianou NK, Aplakidou E, Oulas A, Finn RD, Ovchinnikov S, Pafilis E, Kyrpides NC, Pavlopoulos GA. Exploring microbial functional biodiversity at the protein family level-From metagenomic sequence reads to annotated protein clusters. FRONTIERS IN BIOINFORMATICS 2023; 3:1157956. [PMID: 36959975 PMCID: PMC10029925 DOI: 10.3389/fbinf.2023.1157956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 02/21/2023] [Indexed: 03/06/2023] Open
Abstract
Metagenomics has enabled accessing the genetic repertoire of natural microbial communities. Metagenome shotgun sequencing has become the method of choice for studying and classifying microorganisms from various environments. To this end, several methods have been developed to process and analyze the sequence data from raw reads to end-products such as predicted protein sequences or families. In this article, we provide a thorough review to simplify such processes and discuss the alternative methodologies that can be followed in order to explore biodiversity at the protein family level. We provide details for analysis tools and we comment on their scalability as well as their advantages and disadvantages. Finally, we report the available data repositories and recommend various approaches for protein family annotation related to phylogenetic distribution, structure prediction and metadata enrichment.
Collapse
Affiliation(s)
- Fotis A. Baltoumas
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
| | - Evangelos Karatzas
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
| | - David Paez-Espino
- Lawrence Berkeley National Laboratory, DOE Joint Genome Institute, Berkeley, CA, United States
| | - Nefeli K. Venetsianou
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
| | - Eleni Aplakidou
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
| | - Anastasis Oulas
- The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus
| | - Robert D. Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, United Kingdom
| | - Sergey Ovchinnikov
- John Harvard Distinguished Science Fellowship Program, Harvard University, Cambridge, MA, United States
| | - Evangelos Pafilis
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece
| | - Nikos C. Kyrpides
- Lawrence Berkeley National Laboratory, DOE Joint Genome Institute, Berkeley, CA, United States
| | - Georgios A. Pavlopoulos
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
- Center of New Biotechnologies and Precision Medicine, Department of Medicine, School of Health Sciences, National and Kapodistrian University of Athens, Athens, Greece
- Hellenic Army Academy, Vari, Greece
| |
Collapse
|
33
|
Yariv B, Yariv E, Kessel A, Masrati G, Chorin AB, Martz E, Mayrose I, Pupko T, Ben‐Tal N. Using evolutionary data to make sense of macromolecules with a "face-lifted" ConSurf. Protein Sci 2023; 32:e4582. [PMID: 36718848 PMCID: PMC9942591 DOI: 10.1002/pro.4582] [Citation(s) in RCA: 145] [Impact Index Per Article: 72.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Revised: 01/21/2023] [Accepted: 01/27/2023] [Indexed: 02/01/2023]
Abstract
The ConSurf web-sever for the analysis of proteins, RNA, and DNA provides a quick and accurate estimate of the per-site evolutionary rate among homologues. The analysis reveals functionally important regions, such as catalytic and ligand-binding sites, which often evolve slowly. Since the last report in 2016, ConSurf has been improved in multiple ways. It now has a user-friendly interface that makes it easier to perform the analysis and to visualize the results. Evolutionary rates are calculated based on a set of homologous sequences, collected using hidden Markov model-based search tools, recently embedded in the pipeline. Using these, and following the removal of redundancy, ConSurf assembles a representative set of effective homologues for protein and nucleic acid queries to enable informative analysis of the evolutionary patterns. The analysis is particularly insightful when the evolutionary rates are mapped on the macromolecule structure. In this respect, the availability of AlphaFold model structures of essentially all UniProt proteins makes ConSurf particularly relevant to the research community. The UniProt ID of a query protein with an available AlphaFold model can now be used to start a calculation. Another important improvement is the Python re-implementation of the entire computational pipeline, making it easier to maintain. This Python pipeline is now available for download as a standalone version. We demonstrate some of ConSurf's key capabilities by the analysis of caveolin-1, the main protein of membrane invaginations called caveolae.
Collapse
Affiliation(s)
- Barak Yariv
- George S. Wise Faculty of Life Sciences, Department of Biochemistry and Molecular BiologyTel Aviv UniversityTel AvivIsrael
| | - Elon Yariv
- George S. Wise Faculty of Life Sciences, Department of Biochemistry and Molecular BiologyTel Aviv UniversityTel AvivIsrael
| | - Amit Kessel
- George S. Wise Faculty of Life Sciences, Department of Biochemistry and Molecular BiologyTel Aviv UniversityTel AvivIsrael
| | - Gal Masrati
- George S. Wise Faculty of Life Sciences, Department of Biochemistry and Molecular BiologyTel Aviv UniversityTel AvivIsrael
| | - Adi Ben Chorin
- George S. Wise Faculty of Life Sciences, Department of Biochemistry and Molecular BiologyTel Aviv UniversityTel AvivIsrael
| | - Eric Martz
- Department of MicrobiologyUniversity of MassachusettsAmherstMassachusettsUSA
| | - Itay Mayrose
- George S. Wise Faculty of Life Sciences, School of Plant Sciences and Food SecurityTel Aviv UniversityTel AvivIsrael
| | - Tal Pupko
- George S. Wise Faculty of Life Sciences, The Shmunis School of Biomedicine and Cancer ResearchTel Aviv UniversityTel AvivIsrael
| | - Nir Ben‐Tal
- George S. Wise Faculty of Life Sciences, Department of Biochemistry and Molecular BiologyTel Aviv UniversityTel AvivIsrael
| |
Collapse
|
34
|
Oliveira LS, Reyes A, Dutilh BE, Gruber A. Rational Design of Profile HMMs for Sensitive and Specific Sequence Detection with Case Studies Applied to Viruses, Bacteriophages, and Casposons. Viruses 2023; 15:519. [PMID: 36851733 PMCID: PMC9966878 DOI: 10.3390/v15020519] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Revised: 02/01/2023] [Accepted: 02/09/2023] [Indexed: 02/15/2023] Open
Abstract
Profile hidden Markov models (HMMs) are a powerful way of modeling biological sequence diversity and constitute a very sensitive approach to detecting divergent sequences. Here, we report the development of protocols for the rational design of profile HMMs. These methods were implemented on TABAJARA, a program that can be used to either detect all biological sequences of a group or discriminate specific groups of sequences. By calculating position-specific information scores along a multiple sequence alignment, TABAJARA automatically identifies the most informative sequence motifs and uses them to construct profile HMMs. As a proof-of-principle, we applied TABAJARA to generate profile HMMs for the detection and classification of two viral groups presenting different evolutionary rates: bacteriophages of the Microviridae family and viruses of the Flavivirus genus. We obtained conserved models for the generic detection of any Microviridae or Flavivirus sequence, and profile HMMs that can specifically discriminate Microviridae subfamilies or Flavivirus species. In another application, we constructed Cas1 endonuclease-derived profile HMMs that can discriminate CRISPRs and casposons, two evolutionarily related transposable elements. We believe that the protocols described here, and implemented on TABAJARA, constitute a generic toolbox for generating profile HMMs for the highly sensitive and specific detection of sequence classes.
Collapse
Affiliation(s)
- Liliane S. Oliveira
- Department of Parasitology, Instituto de Ciências Biomédicas, Universidade de São Paulo, São Paulo 05508-000, SP, Brazil
| | - Alejandro Reyes
- Max Planck Tandem Group in Computational Biology, Department of Biological Sciences, Universidad de los Andes, Bogotá 111711, Colombia
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, Saint Louis, MO 63108, USA
| | - Bas E. Dutilh
- Institute of Biodiversity, Faculty of Biological Sciences, Cluster of Excellence Balance of the Microverse, Friedrich-Schiller-University Jena, 07743 Jena, Germany
- Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands
- European Virus Bioinformatics Center, Leutragraben 1, 07743 Jena, Germany
| | - Arthur Gruber
- Department of Parasitology, Instituto de Ciências Biomédicas, Universidade de São Paulo, São Paulo 05508-000, SP, Brazil
- European Virus Bioinformatics Center, Leutragraben 1, 07743 Jena, Germany
| |
Collapse
|
35
|
Experimental and clinical data analysis for identification of COVID-19 resistant ACE2 mutations. Sci Rep 2023; 13:2351. [PMID: 36759535 PMCID: PMC9910265 DOI: 10.1038/s41598-022-20773-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Accepted: 09/19/2022] [Indexed: 02/11/2023] Open
Abstract
The high magnitude zoonotic event has caused by Severe Acute Respitarory Syndrome CoronaVirus-2 (SARS-CoV-2) is Coronavirus Disease-2019 (COVID-19) epidemics. This disease has high rate of spreading than mortality in humans. The human receptor, Angiotensin-Converting Enzyme 2 (ACE2), is the leading target site for viral Spike-protein (S-protein) that function as binding ligands and are responsible for their entry in humans. The patients infected with COVID-19 with comorbidities, particularly cancer patients, have a severe effect or high mortality rate because of the suppressed immune system. Nevertheless, there might be a chance wherein cancer patients cannot be infected with SARS-CoV-2 because of mutations in the ACE2, which may be resistant to the spillover between species. This study aimed to determine the mutations in the sequence of the human ACE2 protein and its dissociation with SARS-CoV-2 that might be rejecting viral transmission. The in silico approaches were performed to identify the impact of SARS-CoV-2 S-protein with ACE2 mutations, validated experimentally, occurred in the patient, and reported in cell lines. The identified changes significantly affect SARS-CoV-2 S-protein interaction with ACE2, demonstrating the reduction in the binding affinity compared to SARS-CoV. The data presented in this study suggest ACE2 mutants have a higher and lower affinity with SARS-Cov-2 S-protein to the wild-type human ACE2 receptor. This study would likely be used to report SARS-CoV-2 resistant ACE2 mutations and can be used to design active peptide development to inactivate the viral spread of SARS-CoV-2 in humans.
Collapse
|
36
|
AlphaFold2 reveals commonalities and novelties in protein structure space for 21 model organisms. Commun Biol 2023; 6:160. [PMID: 36755055 PMCID: PMC9908985 DOI: 10.1038/s42003-023-04488-9] [Citation(s) in RCA: 32] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Accepted: 01/16/2023] [Indexed: 02/10/2023] Open
Abstract
Deep-learning (DL) methods like DeepMind's AlphaFold2 (AF2) have led to substantial improvements in protein structure prediction. We analyse confident AF2 models from 21 model organisms using a new classification protocol (CATH-Assign) which exploits novel DL methods for structural comparison and classification. Of ~370,000 confident models, 92% can be assigned to 3253 superfamilies in our CATH domain superfamily classification. The remaining cluster into 2367 putative novel superfamilies. Detailed manual analysis on 618 of these, having at least one human relative, reveal extremely remote homologies and further unusual features. Only 25 novel superfamilies could be confirmed. Although most models map to existing superfamilies, AF2 domains expand CATH by 67% and increases the number of unique 'global' folds by 36% and will provide valuable insights on structure function relationships. CATH-Assign will harness the huge expansion in structural data provided by DeepMind to rationalise evolutionary changes driving functional divergence.
Collapse
|
37
|
Adeyelu T, Bordin N, Waman VP, Sadlej M, Sillitoe I, Moya-Garcia AA, Orengo CA. KinFams: De-Novo Classification of Protein Kinases Using CATH Functional Units. Biomolecules 2023; 13:277. [PMID: 36830646 PMCID: PMC9953599 DOI: 10.3390/biom13020277] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Revised: 01/24/2023] [Accepted: 01/26/2023] [Indexed: 02/05/2023] Open
Abstract
Protein kinases are important targets for treating human disorders, and they are the second most targeted families after G-protein coupled receptors. Several resources provide classification of kinases into evolutionary families (based on sequence homology); however, very few systematically classify functional families (FunFams) comprising evolutionary relatives that share similar functional properties. We have developed the FunFam-MARC (Multidomain ARchitecture-based Clustering) protocol, which uses multi-domain architectures of protein kinases and specificity-determining residues for functional family classification. FunFam-MARC predicts 2210 kinase functional families (KinFams), which have increased functional coherence, in terms of EC annotations, compared to the widely used KinBase classification. Our protocol provides a comprehensive classification for kinase sequences from >10,000 organisms. We associate human KinFams with diseases and drugs and identify 28 druggable human KinFams, i.e., enriched in clinically approved drugs. Since relatives in the same druggable KinFam tend to be structurally conserved, including the drug-binding site, these KinFams may be valuable for shortlisting therapeutic targets. Information on the human KinFams and associated 3D structures from AlphaFold2 are provided via our CATH FTP website and Zenodo. This gives the domain structure representative of each KinFam together with information on any drug compounds available. For 32% of the KinFams, we provide information on highly conserved residue sites that may be associated with specificity.
Collapse
Affiliation(s)
- Tolulope Adeyelu
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
- Department of Comparative Biomedical Science, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Nicola Bordin
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Vaishali P. Waman
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Marta Sadlej
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Aurelio A. Moya-Garcia
- Departamento de Biología Molecular y Bioquímica, Universidad de Málaga, 29071 Málaga, Spain
- Laboratorio de Biología Molecular del Cáncer, Centro de Investigaciones Médico-Sanitarias (CIMES), Universidad de Málaga, 29071 Málaga, Spain
| | - Christine A. Orengo
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| |
Collapse
|
38
|
Hasenahuer MA, Sanchis-Juan A, Laskowski RA, Baker JA, Stephenson JD, Orengo CA, Raymond FL, Thornton JM. Mapping the Constrained Coding Regions in the Human Genome to Their Corresponding Proteins. J Mol Biol 2023; 435:167892. [PMID: 36410474 PMCID: PMC9875310 DOI: 10.1016/j.jmb.2022.167892] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2022] [Revised: 11/08/2022] [Accepted: 11/14/2022] [Indexed: 11/23/2022]
Abstract
Constrained Coding Regions (CCRs) in the human genome have been derived from DNA sequencing data of large cohorts of healthy control populations, available in the Genome Aggregation Database (gnomAD) [1]. They identify regions depleted of protein-changing variants and thus identify segments of the genome that have been constrained during human evolution. By mapping these DNA-defined regions from genomic coordinates onto the corresponding protein positions and combining this information with protein annotations, we have explored the distribution of CCRs and compared their co-occurrence with different protein functional features, previously annotated at the amino acid level in public databases. As expected, our results reveal that functional amino acids involved in interactions with DNA/RNA, protein-protein contacts and catalytic sites are the protein features most likely to be highly constrained for variation in the control population. More surprisingly, we also found that linear motifs, linear interacting peptides (LIPs), disorder-order transitions upon binding with other protein partners and liquid-liquid phase separating (LLPS) regions are also strongly associated with high constraint for variability. We also compared intra-species constraints in the human CCRs with inter-species conservation and functional residues to explore how such CCRs may contribute to the analysis of protein variants. As has been previously observed, CCRs are only weakly correlated with conservation, suggesting that intraspecies constraints complement interspecies conservation and can provide more information to interpret variant effects.
Collapse
Affiliation(s)
- Marcia A. Hasenahuer
- European Molecular Biology Laboratory – European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK,Cambridge Institute for Medical Research, University of Cambridge, Cambridge CB2 0XY, UK,Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK,Corresponding author at: European Molecular Biology Laboratory – European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK. @MarHasenahuer
| | - Alba Sanchis-Juan
- Department of Haematology, NHS Blood and Transplant Centre, University of Cambridge, Cambridge CB2 0XY, UK,NIHR BioResource, Cambridge University Hospitals NHS Foundation Trust, Cambridge Biomedical Campus, Cambridge CB2 0QQ, UK
| | - Roman A. Laskowski
- European Molecular Biology Laboratory – European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - James A. Baker
- European Molecular Biology Laboratory – European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - James D. Stephenson
- European Molecular Biology Laboratory – European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Christine A. Orengo
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - F. Lucy Raymond
- Cambridge Institute for Medical Research, University of Cambridge, Cambridge CB2 0XY, UK,NIHR BioResource, Cambridge University Hospitals NHS Foundation Trust, Cambridge Biomedical Campus, Cambridge CB2 0QQ, UK
| | - Janet M. Thornton
- European Molecular Biology Laboratory – European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| |
Collapse
|
39
|
Mott TM, Ibarra JS, Kandula N, Senning EN. Mutagenesis studies of TRPV1 subunit interfaces informed by genomic variant analysis. Biophys J 2023; 122:322-332. [PMID: 36518076 PMCID: PMC9892609 DOI: 10.1016/j.bpj.2022.12.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Revised: 11/11/2022] [Accepted: 12/08/2022] [Indexed: 12/15/2022] Open
Abstract
Protein structures and mutagenesis studies have been instrumental in elucidating molecular mechanisms of ion channel function, but making informed choices about which residues to target for mutagenesis can be challenging. Therefore, we investigated the potential for using human population genomic data to further refine our selection of mutagenesis sites in TRPV1. Single nucleotide polymorphism data of TRPV1 from gnomAD 2.1.1 revealed a lower number of missense variants within buried residues of the ankyrin repeat domain and an increased number of variants between secondary structure elements of the transmembrane segments. We hypothesized that residues critical to interactions at interfaces between subunits or domains in the channel would exhibit a similar reduction in variants. We identified in the structure of ground squirrel TRPV1 (PDB: 7LQY) a possible electrostatic network between K155 and K160 in the N-terminal ankyrin repeat domain and E761 and D762 in the C-terminus (K-KED). Consistent with our hypothesis for residues at key interface sites, none of the four residues have any variants reported in gnomAD 2.1.1. Ca2+ imaging of TRPV1 K-KED mutants confirmed significant roles for these residues, but we found that the electrostatic interaction is not essential since channel function is still observed in total charge reversals on the C-terminal side of the interface (E761K/D762K). Interestingly, Ca2+ imaging responses for a charge swap experiment with K155D/D762K showed partially restored wild-type responses. Using electrophysiology, we found that charge reversals on either K155 or D762 increased the baseline currents of TRPV1, and the charge swapped double mutant, K155D/D762K, partially restored baseline currents to wild-type levels. We interpret these results to mean that contacts across residues in the K-KED interface shift the equilibria of conformations to closed pore states. Our study demonstrates the utility and applicability of a combined missense variant and structure targeted investigation of residues at TRPV1 subunit interfaces.
Collapse
Affiliation(s)
- Taylor M Mott
- Department of Neuroscience, The University of Texas at Austin, Austin, Texas 78712
| | - Jordan S Ibarra
- Department of Neuroscience, The University of Texas at Austin, Austin, Texas 78712
| | - Nivitha Kandula
- School of Medicine, University of Missouri-Kansas City, 5000 Holmes St, Kansas City, Missouri 64110
| | - Eric N Senning
- Department of Neuroscience, The University of Texas at Austin, Austin, Texas 78712.
| |
Collapse
|
40
|
Nallapareddy V, Bordin N, Sillitoe I, Heinzinger M, Littmann M, Waman VP, Sen N, Rost B, Orengo C. CATHe: detection of remote homologues for CATH superfamilies using embeddings from protein language models. Bioinformatics 2023; 39:6989624. [PMID: 36648327 PMCID: PMC9887088 DOI: 10.1093/bioinformatics/btad029] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2022] [Revised: 12/07/2022] [Accepted: 01/16/2023] [Indexed: 01/18/2023] Open
Abstract
MOTIVATION CATH is a protein domain classification resource that exploits an automated workflow of structure and sequence comparison alongside expert manual curation to construct a hierarchical classification of evolutionary and structural relationships. The aim of this study was to develop algorithms for detecting remote homologues missed by state-of-the-art hidden Markov model (HMM)-based approaches. The method developed (CATHe) combines a neural network with sequence representations obtained from protein language models. It was assessed using a dataset of remote homologues having less than 20% sequence identity to any domain in the training set. RESULTS The CATHe models trained on 1773 largest and 50 largest CATH superfamilies had an accuracy of 85.6 ± 0.4% and 98.2 ± 0.3%, respectively. As a further test of the power of CATHe to detect more remote homologues missed by HMMs derived from CATH domains, we used a dataset consisting of protein domains that had annotations in Pfam, but not in CATH. By using highly reliable CATHe predictions (expected error rate <0.5%), we were able to provide CATH annotations for 4.62 million Pfam domains. For a subset of these domains from Homo sapiens, we structurally validated 90.86% of the predictions by comparing their corresponding AlphaFold2 structures with structures from the CATH superfamilies to which they were assigned. AVAILABILITY AND IMPLEMENTATION The code for the developed models is available on https://github.com/vam-sin/CATHe, and the datasets developed in this study can be accessed on https://zenodo.org/record/6327572. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Vamsi Nallapareddy
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Nicola Bordin
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Michael Heinzinger
- Department of Informatics, Bioinformatics and Computational Biology—i12, Technical University of Munich (TUM), Garching/Munich 85748, Germany
| | - Maria Littmann
- Department of Informatics, Bioinformatics and Computational Biology—i12, Technical University of Munich (TUM), Garching/Munich 85748, Germany
| | - Vaishali P Waman
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Neeladri Sen
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Burkhard Rost
- Department of Informatics, Bioinformatics and Computational Biology—i12, Technical University of Munich (TUM), Garching/Munich 85748, Germany
- Institute for Advanced Study (TUM-IAS), Garching/Munich 85748, Germany
- TUM School of Life Sciences Weihenstephan (WZW) 85354, Germany
| | | |
Collapse
|
41
|
Figueiredo-Nunes I, Trigueiro-Louro J, Rebelo-de-Andrade H. Exploring new antiviral targets for influenza and COVID-19: Mapping promising hot spots in viral RNA polymerases. Virology 2023; 578:45-60. [PMID: 36463618 PMCID: PMC9674405 DOI: 10.1016/j.virol.2022.11.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2022] [Revised: 10/27/2022] [Accepted: 11/03/2022] [Indexed: 11/19/2022]
Abstract
Influenza and COVID-19 are infectious respiratory diseases that represent a major concern to public health with social and economic impact worldwide, for which the available therapeutic options are not satisfactory. The RdRp has a central role in viral replication and thus represents a major target for the development of antiviral approaches. In this study, we focused on Influenza A virus PB1 polymerase protein and the betacoronaviruses nsp12 polymerase protein, considering their functional and structural similarities. We have performed conservation and druggability analysis to map conserved druggable regions, that may have functional or structural importance in these proteins. We disclosed the most promising and new targeting regions for the discovery of new potential polymerase inhibitors. Conserved druggable regions of putative interaction with favipiravir and molnupiravir were also mapped. We have also compared and integrated the current findings with previous research.
Collapse
Affiliation(s)
- Inês Figueiredo-Nunes
- Host-Pathogen Interaction Unit, Research Institute for Medicines (iMed.ULisboa), Faculty of Pharmacy, Universidade de Lisboa, Av. Professor Gama Pinto, 1649-003, Lisbon, Portugal
| | - João Trigueiro-Louro
- Host-Pathogen Interaction Unit, Research Institute for Medicines (iMed.ULisboa), Faculty of Pharmacy, Universidade de Lisboa, Av. Professor Gama Pinto, 1649-003, Lisbon, Portugal; Antiviral Resistance Lab, Research & Development Unit, Infectious Diseases Department, Instituto Nacional de Saúde Doutor Ricardo Jorge, IP, Av. Padre Cruz, 1649-016, Lisbon, Portugal.
| | - Helena Rebelo-de-Andrade
- Host-Pathogen Interaction Unit, Research Institute for Medicines (iMed.ULisboa), Faculty of Pharmacy, Universidade de Lisboa, Av. Professor Gama Pinto, 1649-003, Lisbon, Portugal; Antiviral Resistance Lab, Research & Development Unit, Infectious Diseases Department, Instituto Nacional de Saúde Doutor Ricardo Jorge, IP, Av. Padre Cruz, 1649-016, Lisbon, Portugal.
| |
Collapse
|
42
|
Barrett C, Bura A, He Q, Huang F, Reidys C. The arithmetic topology of genetic alignments. J Math Biol 2023; 86:34. [PMID: 36695949 PMCID: PMC9875784 DOI: 10.1007/s00285-023-01868-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2022] [Revised: 01/03/2023] [Accepted: 01/06/2023] [Indexed: 01/26/2023]
Abstract
We propose a novel mathematical paradigm for the study of genetic variation in sequence alignments. This framework originates from extending the notion of pairwise relations, upon which current analysis is based on, to k-ary dissimilarity. This dissimilarity naturally leads to a generalization of simplicial complexes by endowing simplices with weights, compatible with the boundary operator. We introduce the notion of k-stances and dissimilarity complex, the former encapsulating arithmetic as well as topological structure expressing these k-ary relations. We study basic mathematical properties of dissimilarity complexes and show how this approach captures watershed moments of viral dynamics in the context of SARS-CoV-2 and H1N1 flu genomic data.
Collapse
Affiliation(s)
- Christopher Barrett
- Biocomplexity Institute, University of Virginia, 994 Research Park Boulevard, Charlottesville, VA 22911 USA ,Department of Computer Science, University of Virginia, 351 McCormick Road, Charlottesville, VA 22904 USA
| | - Andrei Bura
- Biocomplexity Institute, University of Virginia, 994 Research Park Boulevard, Charlottesville, VA 22911 USA
| | - Qijun He
- Biocomplexity Institute, University of Virginia, 994 Research Park Boulevard, Charlottesville, VA 22911 USA
| | - Fenix Huang
- Biocomplexity Institute, University of Virginia, 994 Research Park Boulevard, Charlottesville, VA 22911 USA
| | - Christian Reidys
- Biocomplexity Institute, University of Virginia, 994 Research Park Boulevard, Charlottesville, VA, 22911, USA. .,Department of Mathematics, University of Virginia, 141 Cabell Drive, Charlottesville, VA, 22904, USA.
| |
Collapse
|
43
|
Deutsch N, Pajkos M, Erdős G, Dosztányi Z. DisCanVis: Visualizing integrated structural and functional annotations to better understand the effect of cancer mutations located within disordered proteins. Protein Sci 2023; 32:e4522. [PMID: 36452990 PMCID: PMC9793970 DOI: 10.1002/pro.4522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Revised: 11/15/2022] [Accepted: 11/16/2022] [Indexed: 12/03/2022]
Abstract
Intrinsically disordered proteins (IDPs) play important roles in a wide range of biological processes and have been associated with various diseases, including cancer. In the last few years, cancer genome projects have systematically collected genetic variations underlying multiple cancer types. In parallel, the number and different types of disordered proteins characterized by experimental methods have also significantly increased. Nevertheless, the role of IDPs in various types of cancer is still not well understood. In this work, we present DisCanVis, a novel visualization tool for cancer mutations with a special focus on IDPs. In order to aid the interpretation of observed mutations, genome level information is combined with information about the structural and functional properties of proteins. The web server enables users to inspect individual proteins, collect examples with existing annotations of protein disorder and associated function or to discover currently uncharacterized examples with likely disease relevance. Through a REST API interface and precompiled tables the analysis can be extended to a group of proteins.
Collapse
Affiliation(s)
- Norbert Deutsch
- Department of BiochemistryInstitute of Biology, ELTE Eötvös Loránd UniversityBudapestHungary
| | - Mátyás Pajkos
- Department of BiochemistryInstitute of Biology, ELTE Eötvös Loránd UniversityBudapestHungary
| | - Gábor Erdős
- Department of BiochemistryInstitute of Biology, ELTE Eötvös Loránd UniversityBudapestHungary
| | - Zsuzsanna Dosztányi
- Department of BiochemistryInstitute of Biology, ELTE Eötvös Loránd UniversityBudapestHungary
| |
Collapse
|
44
|
Parigger L, Krassnigg A, Schopper T, Singh A, Tappler K, Köchl K, Hetmann M, Gruber K, Steinkellner G, Gruber CC. Recent changes in the mutational dynamics of the SARS-CoV-2 main protease substantiate the danger of emerging resistance to antiviral drugs. Front Med (Lausanne) 2022; 9:1061142. [PMID: 36590977 PMCID: PMC9794616 DOI: 10.3389/fmed.2022.1061142] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Accepted: 11/28/2022] [Indexed: 12/15/2022] Open
Abstract
Introduction The current coronavirus pandemic is being combated worldwide by nontherapeutic measures and massive vaccination programs. Nevertheless, therapeutic options such as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) main-protease (Mpro) inhibitors are essential due to the ongoing evolution toward escape from natural or induced immunity. While antiviral strategies are vulnerable to the effects of viral mutation, the relatively conserved Mpro makes an attractive drug target: Nirmatrelvir, an antiviral targeting its active site, has been authorized for conditional or emergency use in several countries since December 2021, and a number of other inhibitors are under clinical evaluation. We analyzed recent SARS-CoV-2 genomic data, since early detection of potential resistances supports a timely counteraction in drug development and deployment, and discovered accelerated mutational dynamics of Mpro since early December 2021. Methods We performed a comparative analysis of 10.5 million SARS-CoV-2 genome sequences available by June 2022 at GISAID to the NCBI reference genome sequence NC_045512.2. Amino-acid exchanges within high-quality regions in 69,878 unique Mpro sequences were identified and time- and in-depth sequence analyses including a structural representation of mutational dynamics were performed using in-house software. Results The analysis showed a significant recent event of mutational dynamics in Mpro. We report a remarkable increase in mutational variability in an eight-residue long consecutive region (R188-G195) near the active site since December 2021. Discussion The increased mutational variability in close proximity to an antiviral-drug binding site as described herein may suggest the onset of the development of antiviral resistance. This emerging diversity urgently needs to be further monitored and considered in ongoing drug development and lead optimization.
Collapse
Affiliation(s)
- Lena Parigger
- Innophore GmbH, Graz, Austria
- Institute of Molecular Biosciences, University of Graz, Graz, Austria
| | | | | | - Amit Singh
- Innophore GmbH, Graz, Austria
- Institute of Molecular Biosciences, University of Graz, Graz, Austria
| | - Katharina Tappler
- Institute of Molecular Biosciences, University of Graz, Graz, Austria
| | | | - Michael Hetmann
- Innophore GmbH, Graz, Austria
- Institute of Molecular Biosciences, University of Graz, Graz, Austria
- Austrian Centre of Industrial Biotechnology, Graz, Austria
| | - Karl Gruber
- Innophore GmbH, Graz, Austria
- Institute of Molecular Biosciences, University of Graz, Graz, Austria
- Austrian Centre of Industrial Biotechnology, Graz, Austria
- Field of Excellence BioHealth, University of Graz, Graz, Austria
| | - Georg Steinkellner
- Innophore GmbH, Graz, Austria
- Institute of Molecular Biosciences, University of Graz, Graz, Austria
- Field of Excellence BioHealth, University of Graz, Graz, Austria
| | - Christian C. Gruber
- Innophore GmbH, Graz, Austria
- Institute of Molecular Biosciences, University of Graz, Graz, Austria
- Austrian Centre of Industrial Biotechnology, Graz, Austria
- Field of Excellence BioHealth, University of Graz, Graz, Austria
| |
Collapse
|
45
|
De-la-Cruz IM, Kariñho-Betancourt E, Núñez-Farfán J, Oyama K. Gene family evolution and natural selection signatures in Datura spp. (Solanaceae). Front Ecol Evol 2022. [DOI: 10.3389/fevo.2022.916762] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Elucidating the diversification process of congeneric species makes it necessary to identify the factors promoting species variation and diversification. Comparative gene family analysis allows us to elucidate the evolutionary history of species by identifying common genetic/genomic mechanisms underlying species responses to biotic and abiotic environments at the genomic level. In this study, we analyzed the high-quality transcriptomes of four Datura species, D. inoxia, D. pruinosa, D. stramonium, and D. wrightii. We performed a thorough comparative gene family analysis to infer the role of selection in molecular variation, changes in protein physicochemical properties, and gain/loss of genes during their diversification processes. The results revealed common and species-specific signals of positive selection, physicochemical divergence and/or expansion of metabolic genes (e.g., transferases and oxidoreductases) associated with terpene and tropane metabolism and some resistance genes (R genes). The gene family analysis presented here is a valuable tool for understanding the genome evolution of economically and ecologically significant taxa such as the Solanaceae family.
Collapse
|
46
|
Lam SD, Waman VP, Fraternali F, Orengo C, Lees J. Structural and energetic analyses of SARS-CoV-2 N-terminal domain characterise sugar binding pockets and suggest putative impacts of variants on COVID-19 transmission. Comput Struct Biotechnol J 2022; 20:6302-6316. [PMID: 36408455 PMCID: PMC9639386 DOI: 10.1016/j.csbj.2022.11.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 11/03/2022] [Accepted: 11/03/2022] [Indexed: 11/09/2022] Open
Abstract
Coronavirus disease 2019 (COVID-19) caused by SARS-CoV-2 is an ongoing pandemic that causes significant health/socioeconomic burden. Variants of concern (VOCs) have emerged affecting transmissibility, disease severity and re-infection risk. Studies suggest that the - N-terminal domain (NTD) of the spike protein may have a role in facilitating virus entry via sialic-acid receptor binding. Furthermore, most VOCs include novel NTD variants. Despite global sequence and structure similarity, most sialic-acid binding pockets in NTD vary across coronaviruses. Our work suggests ongoing evolutionary tuning of the sugar-binding pockets and recent analyses have shown that NTD insertions in VOCs tend to lie close to loops. We extended the structural characterisation of these sugar-binding pockets and explored whether variants could enhance sialic acid-binding. We found that recent NTD insertions in VOCs (i.e., Gamma, Delta and Omicron variants) and emerging variants of interest (VOIs) (i.e., Iota, Lambda and Theta variants) frequently lie close to sugar-binding pockets. For some variants, including the recent Omicron VOC, we find increases in predicted sialic acid-binding energy, compared to the original SARS-CoV-2, which may contribute to increased transmission. These binding observations are supported by molecular dynamics simulations (MD). We examined the similarity of NTD across Betacoronaviruses to determine whether the sugar-binding pockets are sufficiently similar to be exploited in drug design. Whilst most pockets are too structurally variable, we detected a previously unknown highly structurally conserved pocket which can be investigated in pursuit of a generic pan-Betacoronavirus drug. Our structure-based analyses help rationalise the effects of VOCs and provide hypotheses for experiments. Our findings suggest a strong need for experimental monitoring of changes in NTD of VOCs.
Collapse
Affiliation(s)
- Su Datt Lam
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom
- Department of Applied Physics, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, Bangi, Malaysia
| | - Vaishali P. Waman
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom
| | - Franca Fraternali
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom
| | - Jonathan Lees
- Translational Health Sciences, Bristol Medical University, University of Bristol, Bristol, United Kingdom
- Faculty of Health and Life Sciences, Oxford Brookes University, Oxford, United Kingdom
| |
Collapse
|
47
|
Mollusc Crystallins: Physical and Chemical Properties and Phylogenetic Analysis. DIVERSITY 2022. [DOI: 10.3390/d14100827] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
The purpose of the present study was to perform bioinformatic analysis of crystallin diversity in aquatic molluscs based on the sequences in the NCBI Protein database. The objectives were as follows: (1) analysis of some physical and chemical properties of mollusc crystallins, (2) comparison of mollusc crystallins with zebrafish and cubomedusa Tripedalia cystophora crystallins, and (3) determination of the most probable candidates for the role of gastropod eye crystallins. The calculated average GRAVY values revealed that the majority of the seven crystallin groups, except for μ- and ζ-crystallins, were hydrophilic proteins. The predominant predicted secondary structures of the crystallins in most cases were α-helices and coils. The highest values of refractive index increment (dn/dc) were typical for crystallins of aquatic organisms with known lens protein composition (zebrafish, cubomedusa, and octopuses) and for S-crystallin of Pomacea canaliculata. The evolutionary relationships between the studied crystallins, obtained from multiple sequence alignments using Clustal Omega and MUSCLE, and the normalized conservation index, calculated by Mirny, showed that the most conservative proteins were Ω-crystallins but the most diverse were S-crystallins. The phylogenetic analysis of crystallin was generally consistent with modern mollusc taxonomy. Thus, α- and S-, and, possibly, J1A-crystallins, can be assumed to be the most likely candidates for the role of gastropod lens crystallins.
Collapse
|
48
|
Girbig M, Xie J, Grötsch H, Libri D, Porrua O, Müller CW. Architecture of the yeast Pol III pre-termination complex and pausing mechanism on poly(dT) termination signals. Cell Rep 2022; 40:111316. [PMID: 36070694 DOI: 10.1016/j.celrep.2022.111316] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Revised: 07/01/2022] [Accepted: 08/15/2022] [Indexed: 12/20/2022] Open
Abstract
RNA polymerase (Pol) III is specialized to transcribe short, abundant RNAs, for which it terminates transcription on polythymine (dT) stretches on the non-template (NT) strand. When Pol III reaches the termination signal, it pauses and forms the pre-termination complex (PTC). Here, we report cryoelectron microscopy (cryo-EM) structures of the yeast Pol III PTC and complementary functional states at resolutions of 2.7-3.9 Å. Pol III recognizes the poly(dT) termination signal with subunit C128 that forms a hydrogen-bond network with the NT strand and, thereby, induces pausing. Mutating key interacting residues interferes with transcription termination in vitro, impairs yeast growth, and causes global termination defects in vivo, confirming our structural results. Additional cryo-EM analysis reveals that C53-C37, a Pol III subcomplex and key termination factor, participates indirectly in Pol III termination. We propose a mechanistic model of Pol III transcription termination and rationalize why Pol III, unlike Pol I and Pol II, terminates on poly(dT) signals.
Collapse
Affiliation(s)
- Mathias Girbig
- Structural and Computational Biology Unit, European Molecular Biology Laboratory (EMBL), Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Juanjuan Xie
- Université de Paris, CNRS, Institut Jacques Monod, 75006 Paris, France
| | - Helga Grötsch
- Structural and Computational Biology Unit, European Molecular Biology Laboratory (EMBL), Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Domenico Libri
- Université de Paris, CNRS, Institut Jacques Monod, 75006 Paris, France
| | - Odil Porrua
- Université de Paris, CNRS, Institut Jacques Monod, 75006 Paris, France
| | - Christoph W Müller
- Structural and Computational Biology Unit, European Molecular Biology Laboratory (EMBL), Meyerhofstraße 1, 69117 Heidelberg, Germany.
| |
Collapse
|
49
|
Simsek C, Bloemen M, Jansen D, Descheemaeker P, Reynders M, Van Ranst M, Matthijnssens J. Rotavirus vaccine-derived cases in Belgium: Evidence for reversion of attenuating mutations and alternative causes of gastroenteritis. Vaccine 2022; 40:5114-5125. [PMID: 35871871 DOI: 10.1016/j.vaccine.2022.06.082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Revised: 06/27/2022] [Accepted: 06/30/2022] [Indexed: 02/07/2023]
Abstract
Since the introduction of live-attenuated rotavirus vaccines in Belgium in 2006, surveillance has routinely detected rotavirus vaccine-derived strains. However, their genomic landscape and potential role in gastroenteritis have not been thoroughly investigated. We compared VP7 and VP4 nucleotide sequences obtained from rotavirus surveillance with the Rotarix vaccine sequence. As a result, we identified 80 vaccine-derived strains in 5125 rotavirus-positive infants with gastroenteritis from 2007 to 2018. Using both viral metagenomics and reverse transcription qPCR, we evaluated the vaccine strains and screened for co-infecting enteropathogens. Among the 45 patients with known vaccination status, 39 were vaccinated and 87% received the vaccine less than a month before the gastroenteritis episode. Reconstruction of 30 near complete vaccine-derived genomes revealed 0-11 mutations per genome, with 88% of them being non-synonymous. This, in combination with several shared amino acid changes among strains, pointed at selection of minor variant(s) present in the vaccine. We also found that some of these substitutions were true revertants (e.g., F167L on VP4, and I45T on NSP4). Finally, co-infections with known (e.g., Clostridioides difficile and norovirus) and divergent or emerging (e.g., human parechovirus A1, salivirus A2) pathogens were detected, and we estimated that 35% of the infants likely had gastroenteritis due to a 'non-rotavirus' cause. Conversely, we could not rule out the vaccine-derived gastroenteritis in over half of the cases. Continued studies inspecting reversion to pathogenicity should monitor the long-time safety of live-attenuated rotavirus vaccines. All in all, the complementary approach with NGS and qPCR provided a better understanding of rotavirus vaccine strain evolution in the Belgian population and epidemiology of co-infecting enteropathogens in suspected rotavirus vaccine-derived gastroenteritis cases.
Collapse
Affiliation(s)
- Ceren Simsek
- KU Leuven - University of Leuven, Department of Microbiology, Immunology and Transplantation, Rega Institute for Medical Research, Leuven, Belgium
| | - Mandy Bloemen
- KU Leuven - University of Leuven, Department of Microbiology, Immunology and Transplantation, Rega Institute for Medical Research, Leuven, Belgium
| | - Daan Jansen
- KU Leuven - University of Leuven, Department of Microbiology, Immunology and Transplantation, Rega Institute for Medical Research, Leuven, Belgium
| | - Patrick Descheemaeker
- Department of Laboratory Medicine, Medical Microbiology, AZ Sint-Jan, Brugge-Oostende AV, Bruges, Belgium
| | - Marijke Reynders
- Department of Laboratory Medicine, Medical Microbiology, AZ Sint-Jan, Brugge-Oostende AV, Bruges, Belgium
| | - Marc Van Ranst
- KU Leuven - University of Leuven, Department of Microbiology, Immunology and Transplantation, Rega Institute for Medical Research, Leuven, Belgium
| | - Jelle Matthijnssens
- KU Leuven - University of Leuven, Department of Microbiology, Immunology and Transplantation, Rega Institute for Medical Research, Leuven, Belgium.
| |
Collapse
|
50
|
Sen N, Anishchenko I, Bordin N, Sillitoe I, Velankar S, Baker D, Orengo C. Characterizing and explaining the impact of disease-associated mutations in proteins without known structures or structural homologs. Brief Bioinform 2022; 23:bbac187. [PMID: 35641150 PMCID: PMC9294430 DOI: 10.1093/bib/bbac187] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2021] [Revised: 04/23/2022] [Accepted: 04/27/2022] [Indexed: 12/12/2022] Open
Abstract
Mutations in human proteins lead to diseases. The structure of these proteins can help understand the mechanism of such diseases and develop therapeutics against them. With improved deep learning techniques, such as RoseTTAFold and AlphaFold, we can predict the structure of proteins even in the absence of structural homologs. We modeled and extracted the domains from 553 disease-associated human proteins without known protein structures or close homologs in the Protein Databank. We noticed that the model quality was higher and the Root mean square deviation (RMSD) lower between AlphaFold and RoseTTAFold models for domains that could be assigned to CATH families as compared to those which could only be assigned to Pfam families of unknown structure or could not be assigned to either. We predicted ligand-binding sites, protein-protein interfaces and conserved residues in these predicted structures. We then explored whether the disease-associated missense mutations were in the proximity of these predicted functional sites, whether they destabilized the protein structure based on ddG calculations or whether they were predicted to be pathogenic. We could explain 80% of these disease-associated mutations based on proximity to functional sites, structural destabilization or pathogenicity. When compared to polymorphisms, a larger percentage of disease-associated missense mutations were buried, closer to predicted functional sites, predicted as destabilizing and pathogenic. Usage of models from the two state-of-the-art techniques provide better confidence in our predictions, and we explain 93 additional mutations based on RoseTTAFold models which could not be explained based solely on AlphaFold models.
Collapse
Affiliation(s)
- Neeladri Sen
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Ivan Anishchenko
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98195, USA
| | - Nicola Bordin
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Sameer Velankar
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - David Baker
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98195, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| |
Collapse
|