1
|
Herrera LPT, Andreassen SN, Caroli J, Rodríguez-Espigares I, Kermani AA, Keserű GM, Kooistra AJ, Pándy-Szekeres G, Gloriam DE. GPCRdb in 2025: adding odorant receptors, data mapper, structure similarity search and models of physiological ligand complexes. Nucleic Acids Res 2024:gkae1065. [PMID: 39558158 DOI: 10.1093/nar/gkae1065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2024] [Revised: 10/17/2024] [Accepted: 10/22/2024] [Indexed: 11/20/2024] Open
Abstract
G protein-coupled receptors (GPCRs) are membrane-spanning transducers mediating the actions of numerous physiological ligands and drugs. The GPCR database GPCRdb supports a large global research community with reference data, analysis, visualization, experiment design and dissemination. Here, we describe our sixth major GPCRdb release starting with an overview of all resources for receptors and ligands. As a major addition, all ∼400 human odorant receptors and their orthologs in major model organisms can now be studied across the various data and tool resources. For the first time, a Data mapper page enables users to map their own data onto receptors visualized as a GPCRome wheel, tree, clusters, list or heatmap. The structure model data have been expanded with models of physiological ligand complexes and updated with new state-specific structure models of all human GPCRs (built using AlphaFold, RoseTTAFold and AlphaFold-Multistate). Furthermore, a structure or model (pdb file) can now be queried against GPCRdb's entire structure/model collection through a Structuresimilarity search page implementing FoldSeek. Finally, for ligands, new search tools can query names, database identifiers, similarities or substructures against integrated entries from the ChEMBL, Guide to Pharmacology, PDSP Ki, PubChem, DrugCentral and DrugBank databases. GPCRdb is available at https://gpcrdb.org.
Collapse
Affiliation(s)
- Luis P Taracena Herrera
- Department of Drug Design and Pharmacology, University of Copenhagen, Universitetsparken 2, 2100 Copenhagen, Denmark
| | - Søren N Andreassen
- Department of Drug Design and Pharmacology, University of Copenhagen, Universitetsparken 2, 2100 Copenhagen, Denmark
| | - Jimmy Caroli
- Department of Drug Design and Pharmacology, University of Copenhagen, Universitetsparken 2, 2100 Copenhagen, Denmark
| | - Ismael Rodríguez-Espigares
- Department of Drug Design and Pharmacology, University of Copenhagen, Universitetsparken 2, 2100 Copenhagen, Denmark
| | - Ali A Kermani
- Department of Structural Biology, St. Jude Children's Research Hospital, 262 Danny Thomas Place, Memphis, TN 38105-3678, USA
| | - György M Keserű
- Medicinal Chemistry Research Group, HUN-REN Research Center for Natural Sciences, Magyar tudósok körútja 2., Budapest H-1117, Hungary
| | - Albert J Kooistra
- Department of Drug Design and Pharmacology, University of Copenhagen, Universitetsparken 2, 2100 Copenhagen, Denmark
| | - Gáspár Pándy-Szekeres
- Department of Drug Design and Pharmacology, University of Copenhagen, Universitetsparken 2, 2100 Copenhagen, Denmark
- Medicinal Chemistry Research Group, HUN-REN Research Center for Natural Sciences, Magyar tudósok körútja 2., Budapest H-1117, Hungary
| | - David E Gloriam
- Department of Drug Design and Pharmacology, University of Copenhagen, Universitetsparken 2, 2100 Copenhagen, Denmark
| |
Collapse
|
2
|
Gagalova KK, Yan Y, Wang S, Matzat T, Castellarin SD, Birol I, Edwards D, Schuetz M. Leaf pigmentation in Cannabis sativa: Characterization of anthocyanin biosynthesis in colorful Cannabis varieties. PLANT DIRECT 2024; 8:e70016. [PMID: 39600728 PMCID: PMC11588432 DOI: 10.1002/pld3.70016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/21/2024] [Revised: 08/19/2024] [Accepted: 10/01/2024] [Indexed: 11/29/2024]
Abstract
Cannabis plants produce a spectrum of secondary metabolites, encompassing cannabinoids and more than 300 non-cannabinoid compounds. Among these, anthocyanins have important functions in plants and also have well documented health benefits. Anthocyanins are largely responsible for the red/purple color phenotypes in plants. Although some well-known Cannabis varieties display a wide range of red/purple pigmentation, the genetic underpinnings of anthocyanin biosynthesis have not been well characterized in Cannabis. This study unveils the genetic diversity of anthocyanin biosynthesis genes found in Cannabis, and we characterize the diversity of anthocyanins and related phenolics found in four differently pigmented Cannabis varieties. Our investigation revealed that the genes 4CL, CHS, F3H, F3'H, FLS, DFR, ANS, and OMT exhibited the strongest correlation with anthocyanin accumulation in Cannabis leaves. The results of this study enhance our understanding of the anthocyanin biosynthetic pathway and shed light on the molecular mechanisms governing Cannabis leaf pigmentation.
Collapse
Affiliation(s)
- Kristina K. Gagalova
- Centre for Crop and Disease Management, School of Molecular and Life SciencesCurtin UniversityPerthWAAustralia
- Canada's Michael Smith Genome Sciences CentreBC CancerVancouverBCCanada
| | - Yifan Yan
- Wine Research CentreUniversity of British ColumbiaVancouverBCCanada
| | - Shumin Wang
- Department of BotanyUniversity of British ColumbiaVancouverBCCanada
| | - Till Matzat
- Wine Research CentreUniversity of British ColumbiaVancouverBCCanada
| | | | - Inanc Birol
- Canada's Michael Smith Genome Sciences CentreBC CancerVancouverBCCanada
- Department of Medical GeneticsUniversity of British ColumbiaVancouverBCCanada
| | - David Edwards
- School of Biological Sciences and Institute of AgricultureUniversity of Western AustraliaCrawleyWestern AustraliaAustralia
| | - Mathias Schuetz
- Department of BotanyUniversity of British ColumbiaVancouverBCCanada
- Department of BiologyKwantlen Polytechnic UniversitySurreyBCCanada
| |
Collapse
|
3
|
Soares R, Fonseca BM, Nash BW, Paquete CM, Louro RO. A survey of the Desulfuromonadia "cytochromome" provides a glimpse of the unexplored diversity of multiheme cytochromes in nature. BMC Genomics 2024; 25:982. [PMID: 39428470 PMCID: PMC11492766 DOI: 10.1186/s12864-024-10872-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Accepted: 10/07/2024] [Indexed: 10/22/2024] Open
Abstract
BACKGROUND Multiheme cytochromes c (MHC) provide prokaryotes with a broad metabolic versatility that contributes to their role in the biogeochemical cycling of the elements and in energy production in bioelectrochemical systems. However, MHC have only been isolated and studied in detail from a limited number of species. Among these, Desulfuromonadia spp. are particularly MHC-rich. To obtain a broad view of the diversity of MHC, we employed bioinformatic tools to study the cytochromome encoded in the genomes of the Desulfuromonadia class. RESULTS We found that the distribution of the MHC families follows a different pattern between the two orders of the Desulfuromonadia class and that there is great diversity in the number of heme-binding motifs in MHC. However, the vast majority of MHC have up to 12 heme-binding motifs. MHC predicted to be extracellular are the least conserved and show high diversity, whereas inner membrane MHC are well conserved and show lower diversity. Although the most prevalent MHC have homologues already characterized, nearly half of the MHC families in the Desulforomonadia class have no known characterized homologues. AlphaFold2 was employed to predict their 3D structures. This provides an atlas of novel MHC, including examples with high beta-sheet content and nanowire MHC with unprecedented high numbers of putative heme cofactors per polypeptide. CONCLUSIONS This work illuminates for the first time the universe of experimentally uncharacterized cytochromes that are likely to contribute to the metabolic versatility and to the fitness of Desulfuromonadia in diverse environmental conditions and to drive biotechnological applications of these organisms.
Collapse
Affiliation(s)
- Ricardo Soares
- Av da República (EAN), Instituto de Tecnologia Química e Bioloógica António Xavier da Universidade Nova de Lisboa, Oeiras, 2780-157, Portugal
- Instituto Nacional de Investigação Agrária e Veterinária, Oeiras, Portugal
| | - Bruno M Fonseca
- Av da República (EAN), Instituto de Tecnologia Química e Bioloógica António Xavier da Universidade Nova de Lisboa, Oeiras, 2780-157, Portugal
| | - Benjamin W Nash
- School of Biological Sciences, University of East Anglia, Norwich, NR4 7TJ, UK
| | - Catarina M Paquete
- Av da República (EAN), Instituto de Tecnologia Química e Bioloógica António Xavier da Universidade Nova de Lisboa, Oeiras, 2780-157, Portugal
| | - Ricardo O Louro
- Av da República (EAN), Instituto de Tecnologia Química e Bioloógica António Xavier da Universidade Nova de Lisboa, Oeiras, 2780-157, Portugal.
| |
Collapse
|
4
|
Langschied F, Bordin N, Cosentino S, Fuentes-Palacios D, Glover N, Hiller M, Hu Y, Huerta-Cepas J, Coelho LP, Iwasaki W, Majidian S, Manzano-Morales S, Persson E, Richards TA, Gabaldón T, Sonnhammer E, Thomas PD, Dessimoz C, Ebersberger I. Quest for Orthologs in the Era of Biodiversity Genomics. Genome Biol Evol 2024; 16:evae224. [PMID: 39404012 PMCID: PMC11523110 DOI: 10.1093/gbe/evae224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/11/2024] [Indexed: 11/01/2024] Open
Abstract
The era of biodiversity genomics is characterized by large-scale genome sequencing efforts that aim to represent each living taxon with an assembled genome. Generating knowledge from this wealth of data has not kept up with this pace. We here discuss major challenges to integrating these novel genomes into a comprehensive functional and evolutionary network spanning the tree of life. In summary, the expanding datasets create a need for scalable gene annotation methods. To trace gene function across species, new methods must seek to increase the resolution of ortholog analyses, e.g. by extending analyses to the protein domain level and by accounting for alternative splicing. Additionally, the scope of orthology prediction should be pushed beyond well-investigated proteomes. This demands the development of specialized methods for the identification of orthologs to short proteins and noncoding RNAs and for the functional characterization of novel gene families. Furthermore, protein structures predicted by machine learning are now readily available, but this new information is yet to be integrated with orthology-based analyses. Finally, an increasing focus should be placed on making orthology assignments adhere to the findable, accessible, interoperable, and reusable (FAIR) principles. This fosters green bioinformatics by avoiding redundant computations and helps integrating diverse scientific communities sharing the need for comparative genetics and genomics information. It should also help with communicating orthology-related concepts in a format that is accessible to the public, to counteract existing misinformation about evolution.
Collapse
Affiliation(s)
- Felix Langschied
- Department for Applied Bioinformatics, Institute of Cell Biology and Neuroscience, Goethe University, Frankfurt, Germany
| | - Nicola Bordin
- Institute of Structural and Molecular Biology, University College London, WC1E 6BT, London, UK
| | - Salvatore Cosentino
- Department of Integrated Biosciences, The University of Tokyo, 277-0882 Tokyo, Japan
| | - Diego Fuentes-Palacios
- Barcelona Supercomputing Center (BSC-CNS), 08034 Barcelona, Spain
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, 08028 Barcelona, Spain
| | - Natasha Glover
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Michael Hiller
- Department of Comparative Genomics, Institute of Cell Biology and Neuroscience, Goethe University, Frankfurt, Germany
| | - Yanhui Hu
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
- Drosophila RNAi Screening Center, Harvard Medical School, Boston, MA 02115, USA
| | - Jaime Huerta-Cepas
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Campus de Montegancedo-UPM, Madrid, Spain
| | - Luis Pedro Coelho
- Centre for Microbiome Research, School of Biomedical Sciences, Queensland University of Technology, Translational Research Institute, Woolloongabba, Queensland, Australia
| | - Wataru Iwasaki
- Department of Integrated Biosciences, University of Tokyo, 277-0882 Tokyo, Japan
| | - Sina Majidian
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Saioa Manzano-Morales
- Barcelona Supercomputing Center (BSC-CNS), 08034 Barcelona, Spain
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, 08028 Barcelona, Spain
| | - Emma Persson
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Solna, Sweden
| | | | - Toni Gabaldón
- Barcelona Supercomputing Center (BSC-CNS), 08034 Barcelona, Spain
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, 08028 Barcelona, Spain
- Catalan Institution for Research and Advanced Studies (ICREA), Barcelona, Spain
- CIBER de Enfermedades Infecciosas, Instituto de Salud Carlos III, Madrid, Spain
| | - Erik Sonnhammer
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Solna, Sweden
| | - Paul D Thomas
- Department of Population and Public Health Sciences, University of Southern California, Los Angeles, CA, USA
| | - Christophe Dessimoz
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Ingo Ebersberger
- Department for Applied Bioinformatics, Institute of Cell Biology and Neuroscience, Goethe University, Frankfurt, Germany
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Biodiversity and Climate Research Centre (S-BIK-F), Frankfurt am Main, Germany
| |
Collapse
|
5
|
Milon TI, Wang Y, Fontenot RL, Khajouie P, Villinger F, Raghavan V, Xu W. Development of a novel representation of drug 3D structures and enhancement of the TSR-based method for probing drug and target interactions. Comput Biol Chem 2024; 112:108117. [PMID: 38852360 PMCID: PMC11390338 DOI: 10.1016/j.compbiolchem.2024.108117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Revised: 05/13/2024] [Accepted: 05/31/2024] [Indexed: 06/11/2024]
Abstract
Understanding the mechanisms underlying interactions between drugs and target proteins is critical for drug discovery. In our earlier studies, we introduced the Triangular Spatial Relationship (TSR)-based algorithm, which enables the representation of a protein's 3D structure as a vector of integers (TSR keys). These TSR keys correspond to substructures of the 3D structure of a protein and are computed based on the triangles constructed by all possible triples of Cα atoms within the protein. In this study, we report on a new TSR-based algorithm for probing drug and target interactions. Specifically, we have extended the previous algorithm in three novel directions: TSR keys for representing the 3D structure of a drug or a ligand, cross TSR keys between drugs and their targets and intra-residual TSR keys for phosphorylated amino acids. The outcomes illustrate the key contributions as follows: (i) The TSR-based method, which uses the TSR keys as features, is unique in its capability to interpret hierarchical relationships of drugs as well as drug - target complexes using common and specific TSR keys. (ii) The method can distinguish not only the binding sites from the rest of the protein structures, but also the binding sites of primary targets from those of off-targets. (iii) The method has the potential to correlate the 3D structures of drugs with their functions. (iv) Representation of 3D structures by TSR keys has its unique advantage in terms of ease of making searching for similar substructures across structure datasets easier. In summary, this study presents a novel computational methodology, with significant advantages, for providing insights into the mechanism underlying drug and target interactions.
Collapse
Affiliation(s)
- Tarikul I Milon
- Department of Chemistry, University of Louisiana at Lafayette, P.O. Box 44370, Lafayette, LA 70504, USA
| | - Yuhong Wang
- National Center for Advancing Translational Sciences, 9800 Medical Center Drive, Rockville, MD 20850, USA
| | - Ryan L Fontenot
- Department of Chemistry, University of Louisiana at Lafayette, P.O. Box 44370, Lafayette, LA 70504, USA
| | - Poorya Khajouie
- Department of Chemistry, University of Louisiana at Lafayette, P.O. Box 44370, Lafayette, LA 70504, USA; The Center for Advanced Computer Studies, University of Louisiana at Lafayette, LA 70504, USA
| | - Francois Villinger
- Department of Biology, University of Louisiana at Lafayette, New Iberia, LA 70560, USA
| | - Vijay Raghavan
- The Center for Advanced Computer Studies, University of Louisiana at Lafayette, LA 70504, USA
| | - Wu Xu
- Department of Chemistry, University of Louisiana at Lafayette, P.O. Box 44370, Lafayette, LA 70504, USA.
| |
Collapse
|
6
|
Haidurov A, Budanov AV. Locked in Structure: Sestrin and GATOR-A Billion-Year Marriage. Cells 2024; 13:1587. [PMID: 39329768 PMCID: PMC11429811 DOI: 10.3390/cells13181587] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2024] [Revised: 09/16/2024] [Accepted: 09/17/2024] [Indexed: 09/28/2024] Open
Abstract
Sestrins are a conserved family of stress-responsive proteins that play a crucial role in cellular metabolism, stress response, and ageing. Vertebrates have three Sestrin genes (SESN1, SESN2, and SESN3), while invertebrates encode only one. Initially identified as antioxidant proteins that regulate cell viability, Sestrins are now recognised as crucial inhibitors of the mechanistic target of rapamycin complex 1 kinase (mTORC1), a central regulator of anabolism, cell growth, and autophagy. Sestrins suppress mTORC1 through an inhibitory interaction with the GATOR2 protein complex, which, in concert with GATOR1, signals to inhibit the lysosomal docking of mTORC1. A leucine-binding pocket (LBP) is found in most vertebrate Sestrins, and when bound with leucine, Sestrins do not bind GATOR2, prompting mTORC1 activation. This review examines the evolutionary conservation of Sestrins and their functional motifs, focusing on their origins and development. We highlight that the most conserved regions of Sestrins are those involved in GATOR2 binding, and while analogues of Sestrins exist in prokaryotes, the unique feature of eukaryotic Sestrins is their structural presentation of GATOR2-binding motifs.
Collapse
Affiliation(s)
- Alexander Haidurov
- School of Biochemistry and Immunology, Trinity Biomedical Sciences Institute, Trinity College Dublin, Pearse Street, D02 R590 Dublin, Ireland
| | - Andrei V. Budanov
- School of Biochemistry and Immunology, Trinity Biomedical Sciences Institute, Trinity College Dublin, Pearse Street, D02 R590 Dublin, Ireland
| |
Collapse
|
7
|
Ponamareva I, Andreeva A, Bileschi ML, Colwell L, Bateman A. Investigation of protein family relationships with deep learning. BIOINFORMATICS ADVANCES 2024; 4:vbae132. [PMID: 39399373 PMCID: PMC11467057 DOI: 10.1093/bioadv/vbae132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Revised: 08/01/2024] [Accepted: 09/17/2024] [Indexed: 10/15/2024]
Abstract
Motivation In this article, we propose a method for finding similarities between Pfam families based on the pre-trained neural network ProtENN2. We use the model ProtENN2 per-residue embeddings to produce new high-dimensional per-family embeddings and develop an approach for calculating inter-family similarity scores based on these embeddings, and evaluate its predictions using structure comparison. Results We apply our method to Pfam annotation by refining clan membership for Pfam families, suggesting both new members of existing clans and potential new clans for future Pfam releases. We investigate some of the failure modes of our approach, which suggests directions for future improvements. Our method is relatively simple with few parameters and could be applied to other protein family classification models. Overall, our work suggests potential benefits of employing deep learning for improving our understanding of protein family relationships and functions of previously uncharacterized families. Availability and implementation github.com/iponamareva/ProtCNNSim, 10.5281/zenodo.10091909.
Collapse
Affiliation(s)
- Irina Ponamareva
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, United Kingdom
- Department of Chemistry, University of Cambridge, Cambridge CB2 1EW, United Kingdom
| | - Antonina Andreeva
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, United Kingdom
| | | | - Lucy Colwell
- Department of Chemistry, University of Cambridge, Cambridge CB2 1EW, United Kingdom
- Google Research, Cambridge, MA 02142, United States
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, United Kingdom
| |
Collapse
|
8
|
Cheng P, Mao C, Tang J, Yang S, Cheng Y, Wang W, Gu Q, Han W, Chen H, Li S, Chen Y, Zhou J, Li W, Pan A, Zhao S, Huang X, Zhu S, Zhang J, Shu W, Wang S. Zero-shot prediction of mutation effects with multimodal deep representation learning guides protein engineering. Cell Res 2024; 34:630-647. [PMID: 38969803 PMCID: PMC11369238 DOI: 10.1038/s41422-024-00989-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Accepted: 06/03/2024] [Indexed: 07/07/2024] Open
Abstract
Mutations in amino acid sequences can provoke changes in protein function. Accurate and unsupervised prediction of mutation effects is critical in biotechnology and biomedicine, but remains a fundamental challenge. To resolve this challenge, here we present Protein Mutational Effect Predictor (ProMEP), a general and multiple sequence alignment-free method that enables zero-shot prediction of mutation effects. A multimodal deep representation learning model embedded in ProMEP was developed to comprehensively learn both sequence and structure contexts from ~160 million proteins. ProMEP achieves state-of-the-art performance in mutational effect prediction and accomplishes a tremendous improvement in speed, enabling efficient and intelligent protein engineering. Specifically, ProMEP accurately forecasts mutational consequences on the gene-editing enzymes TnpB and TadA, and successfully guides the development of high-performance gene-editing tools with their engineered variants. The gene-editing efficiency of a 5-site mutant of TnpB reaches up to 74.04% (vs 24.66% for the wild type); and the base editing tool developed on the basis of a TadA 15-site mutant (in addition to the A106V/D108N double mutation that renders deoxyadenosine deaminase activity to TadA) exhibits an A-to-G conversion frequency of up to 77.27% (vs 69.80% for ABE8e, a previous TadA-based adenine base editor) with significantly reduced bystander and off-target effects compared to ABE8e. ProMEP not only showcases superior performance in predicting mutational effects on proteins but also demonstrates a great capability to guide protein engineering. Therefore, ProMEP enables efficient exploration of the gigantic protein space and facilitates practical design of proteins, thereby advancing studies in biomedicine and synthetic biology.
Collapse
Affiliation(s)
- Peng Cheng
- Bioinformatics Center of AMMS, Beijing, China
| | - Cong Mao
- State Key Laboratory of Reproductive Medicine and Offspring Health, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Jin Tang
- Zhejiang Lab, Hangzhou, Zhejiang, China
| | - Sen Yang
- Bioinformatics Center of AMMS, Beijing, China
| | - Yu Cheng
- State Key Laboratory of Reproductive Medicine and Offspring Health, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Wuke Wang
- Zhejiang Lab, Hangzhou, Zhejiang, China
| | - Qiuxi Gu
- State Key Laboratory of Reproductive Medicine and Offspring Health, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Wei Han
- Zhejiang Lab, Hangzhou, Zhejiang, China
| | - Hao Chen
- State Key Laboratory of Reproductive Medicine and Offspring Health, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Sihan Li
- State Key Laboratory of Reproductive Medicine and Offspring Health, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
| | | | | | - Wuju Li
- Bioinformatics Center of AMMS, Beijing, China
| | - Aimin Pan
- Zhejiang Lab, Hangzhou, Zhejiang, China
| | - Suwen Zhao
- iHuman Institute, ShanghaiTech University, Shanghai, China
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China
| | - Xingxu Huang
- Zhejiang Lab, Hangzhou, Zhejiang, China
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China
| | | | - Jun Zhang
- State Key Laboratory of Reproductive Medicine and Offspring Health, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China.
| | - Wenjie Shu
- Bioinformatics Center of AMMS, Beijing, China.
| | | |
Collapse
|
9
|
Feidakis CP, Krivak R, Hoksza D, Novotny M. AHoJ-DB: A PDB-wide Assignment of apo & holo Relationships Based on Individual Protein-Ligand Interactions. J Mol Biol 2024; 436:168545. [PMID: 38508305 DOI: 10.1016/j.jmb.2024.168545] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 03/12/2024] [Accepted: 03/14/2024] [Indexed: 03/22/2024]
Abstract
A single protein structure is rarely sufficient to capture the conformational variability of a protein. Both bound and unbound (holo and apo) forms of a protein are essential for understanding its geometry and making meaningful comparisons. Nevertheless, docking or drug design studies often still consider only single protein structures in their holo form, which are for the most part rigid. With the recent explosion in the field of structural biology, large, curated datasets are urgently needed. Here, we use a previously developed application (AHoJ) to perform a comprehensive search for apo-holo pairs for 468,293 biologically relevant protein-ligand interactions across 27,983 proteins. In each search, the binding pocket is captured and mapped across existing structures within the same UniProt, and the mapped pockets are annotated as apo or holo, based on the presence or absence of ligands. We assemble the results into a database, AHoJ-DB (www.apoholo.cz/db), that captures the variability of proteins with identical sequences, thereby exposing the agents responsible for the observed differences in geometry. We report several metrics for each annotated pocket, and we also include binding pockets that form at the interface of multiple chains. Analysis of the database shows that about 24% of the binding sites occur at the interface of two or more chains and that less than 50% of the total binding sites processed have an apo form in the PDB. These results can be used to train and evaluate predictors, discover potentially druggable proteins, and reveal protein- and ligand-specific relationships that were previously obscured by intermittent or partial data. Availability: www.apoholo.cz/db.
Collapse
Affiliation(s)
- Christos P Feidakis
- Department of Cell Biology, Faculty of Science, Charles University, Prague 12843, Czech Republic.
| | - Radoslav Krivak
- Department of Software Engineering, Faculty of Mathematics and Physics, Charles University, Prague 12116, Czech Republic; Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Prague 16000, Czech Republic
| | - David Hoksza
- Department of Software Engineering, Faculty of Mathematics and Physics, Charles University, Prague 12116, Czech Republic
| | - Marian Novotny
- Department of Cell Biology, Faculty of Science, Charles University, Prague 12843, Czech Republic.
| |
Collapse
|
10
|
Kabir A, Moldwin A, Bromberg Y, Shehu A. In the twilight zone of protein sequence homology: do protein language models learn protein structure? BIOINFORMATICS ADVANCES 2024; 4:vbae119. [PMID: 39183802 PMCID: PMC11344590 DOI: 10.1093/bioadv/vbae119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Revised: 08/01/2024] [Accepted: 08/12/2024] [Indexed: 08/27/2024]
Abstract
Motivation Protein language models based on the transformer architecture are increasingly improving performance on protein prediction tasks, including secondary structure, subcellular localization, and more. Despite being trained only on protein sequences, protein language models appear to implicitly learn protein structure. This paper investigates whether sequence representations learned by protein language models encode structural information and to what extent. Results We address this by evaluating protein language models on remote homology prediction, where identifying remote homologs from sequence information alone requires structural knowledge, especially in the "twilight zone" of very low sequence identity. Through rigorous testing at progressively lower sequence identities, we profile the performance of protein language models ranging from millions to billions of parameters in a zero-shot setting. Our findings indicate that while transformer-based protein language models outperform traditional sequence alignment methods, they still struggle in the twilight zone. This suggests that current protein language models have not sufficiently learned protein structure to address remote homology prediction when sequence signals are weak. Availability and implementation We believe this opens the way for further research both on remote homology prediction and on the broader goal of learning sequence- and structure-rich representations of protein molecules. All code, data, and models are made publicly available.
Collapse
Affiliation(s)
- Anowarul Kabir
- Department of Computer Science, George Mason University, Fairfax, VA 22030, United States
| | - Asher Moldwin
- Department of Computer Science, George Mason University, Fairfax, VA 22030, United States
| | - Yana Bromberg
- Department of Computer Science, Emory University, Atlanta, GA 30307, United States
| | - Amarda Shehu
- Department of Computer Science, George Mason University, Fairfax, VA 22030, United States
| |
Collapse
|
11
|
Middendorf L, Ravi Iyengar B, Eicholt LA. Sequence, Structure, and Functional Space of Drosophila De Novo Proteins. Genome Biol Evol 2024; 16:evae176. [PMID: 39212966 PMCID: PMC11363682 DOI: 10.1093/gbe/evae176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/29/2024] [Indexed: 09/04/2024] Open
Abstract
During de novo emergence, new protein coding genes emerge from previously nongenic sequences. The de novo proteins they encode are dissimilar in composition and predicted biochemical properties to conserved proteins. However, functional de novo proteins indeed exist. Both identification of functional de novo proteins and their structural characterization are experimentally laborious. To identify functional and structured de novo proteins in silico, we applied recently developed machine learning based tools and found that most de novo proteins are indeed different from conserved proteins both in their structure and sequence. However, some de novo proteins are predicted to adopt known protein folds, participate in cellular reactions, and to form biomolecular condensates. Apart from broadening our understanding of de novo protein evolution, our study also provides a large set of testable hypotheses for focused experimental studies on structure and function of de novo proteins in Drosophila.
Collapse
Affiliation(s)
- Lasse Middendorf
- Institute for Evolution and Biodiversity, University of Muenster, Huefferstrasse 1, 48149 Muenster, Germany
| | - Bharat Ravi Iyengar
- Institute for Evolution and Biodiversity, University of Muenster, Huefferstrasse 1, 48149 Muenster, Germany
| | - Lars A Eicholt
- Institute for Evolution and Biodiversity, University of Muenster, Huefferstrasse 1, 48149 Muenster, Germany
| |
Collapse
|
12
|
Yoon PH, Zhang Z, Loi KJ, Adler BA, Lahiri A, Vohra K, Shi H, Rabelo DB, Trinidad M, Boger RS, Al-Shimary MJ, Doudna JA. Structure-guided discovery of ancestral CRISPR-Cas13 ribonucleases. Science 2024; 385:538-543. [PMID: 39024377 DOI: 10.1126/science.adq0553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Accepted: 07/02/2024] [Indexed: 07/20/2024]
Abstract
The RNA-guided ribonuclease CRISPR-Cas13 enables adaptive immunity in bacteria and programmable RNA manipulation in heterologous systems. Cas13s share limited sequence similarity, hindering discovery of related or ancestral systems. To address this, we developed an automated structural-search pipeline to identify an ancestral clade of Cas13 (Cas13an) and further trace Cas13 origins to defense-associated ribonucleases. Despite being one-third the size of other Cas13s, Cas13an mediates robust programmable RNA depletion and defense against diverse bacteriophages. However, unlike its larger counterparts, Cas13an uses a single active site for both CRISPR RNA processing and RNA-guided cleavage, revealing that the ancestral nuclease domain has two modes of activity. Discovery of Cas13an deepens our understanding of CRISPR-Cas evolution and expands opportunities for precision RNA editing, showcasing the promise of structure-guided genome mining.
Collapse
Affiliation(s)
- Peter H Yoon
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
- Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA, USA
- Howard Hughes Medical Institute, University of California, Berkeley, Berkeley CA, USA
| | - Zeyuan Zhang
- Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA, USA
- Howard Hughes Medical Institute, University of California, Berkeley, Berkeley CA, USA
- Biophysics Graduate Group, University of California, Berkeley, Berkeley, CA, USA
- California Institute for Quantitative Biosciences, University of California, Berkeley, Berkeley, CA, USA
| | - Kenneth J Loi
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
- Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA, USA
| | - Benjamin A Adler
- Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA, USA
- Howard Hughes Medical Institute, University of California, Berkeley, Berkeley CA, USA
- California Institute for Quantitative Biosciences, University of California, Berkeley, Berkeley, CA, USA
| | - Arushi Lahiri
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
- Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA, USA
| | - Kamakshi Vohra
- Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA, USA
- California Institute for Quantitative Biosciences, University of California, Berkeley, Berkeley, CA, USA
| | - Honglue Shi
- Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA, USA
- Howard Hughes Medical Institute, University of California, Berkeley, Berkeley CA, USA
| | - Daniel Bellieny Rabelo
- Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA, USA
- California Institute for Quantitative Biosciences, University of California, Berkeley, Berkeley, CA, USA
| | - Marena Trinidad
- Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA, USA
- Howard Hughes Medical Institute, University of California, Berkeley, Berkeley CA, USA
| | - Ron S Boger
- Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA, USA
- Howard Hughes Medical Institute, University of California, Berkeley, Berkeley CA, USA
- Biophysics Graduate Group, University of California, Berkeley, Berkeley, CA, USA
| | - Muntathar J Al-Shimary
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
- Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA, USA
- Howard Hughes Medical Institute, University of California, Berkeley, Berkeley CA, USA
| | - Jennifer A Doudna
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
- Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA, USA
- Howard Hughes Medical Institute, University of California, Berkeley, Berkeley CA, USA
- California Institute for Quantitative Biosciences, University of California, Berkeley, Berkeley, CA, USA
- Gladstone Institutes, San Francisco, CA, USA
- Gladstone-UCSF Institute of Genomic Immunology, San Francisco, CA, USA
- Molecular Biophysics and Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Department of Chemistry, University of California, Berkeley, Berkeley, CA, USA
| |
Collapse
|
13
|
Chen L, Li Q, Nasif KFA, Xie Y, Deng B, Niu S, Pouriyeh S, Dai Z, Chen J, Xie CY. AI-Driven Deep Learning Techniques in Protein Structure Prediction. Int J Mol Sci 2024; 25:8426. [PMID: 39125995 PMCID: PMC11313475 DOI: 10.3390/ijms25158426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2024] [Revised: 07/29/2024] [Accepted: 07/29/2024] [Indexed: 08/12/2024] Open
Abstract
Protein structure prediction is important for understanding their function and behavior. This review study presents a comprehensive review of the computational models used in predicting protein structure. It covers the progression from established protein modeling to state-of-the-art artificial intelligence (AI) frameworks. The paper will start with a brief introduction to protein structures, protein modeling, and AI. The section on established protein modeling will discuss homology modeling, ab initio modeling, and threading. The next section is deep learning-based models. It introduces some state-of-the-art AI models, such as AlphaFold (AlphaFold, AlphaFold2, AlphaFold3), RoseTTAFold, ProteinBERT, etc. This section also discusses how AI techniques have been integrated into established frameworks like Swiss-Model, Rosetta, and I-TASSER. The model performance is compared using the rankings of CASP14 (Critical Assessment of Structure Prediction) and CASP15. CASP16 is ongoing, and its results are not included in this review. Continuous Automated Model EvaluatiOn (CAMEO) complements the biennial CASP experiment. Template modeling score (TM-score), global distance test total score (GDT_TS), and Local Distance Difference Test (lDDT) score are discussed too. This paper then acknowledges the ongoing difficulties in predicting protein structure and emphasizes the necessity of additional searches like dynamic protein behavior, conformational changes, and protein-protein interactions. In the application section, this paper introduces some applications in various fields like drug design, industry, education, and novel protein development. In summary, this paper provides a comprehensive overview of the latest advancements in established protein modeling and deep learning-based models for protein structure predictions. It emphasizes the significant advancements achieved by AI and identifies potential areas for further investigation.
Collapse
Affiliation(s)
- Lingtao Chen
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA; (L.C.); (Q.L.); (K.F.A.N.); (Y.X.); (B.D.); (S.P.)
| | - Qiaomu Li
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA; (L.C.); (Q.L.); (K.F.A.N.); (Y.X.); (B.D.); (S.P.)
| | - Kazi Fahim Ahmad Nasif
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA; (L.C.); (Q.L.); (K.F.A.N.); (Y.X.); (B.D.); (S.P.)
| | - Ying Xie
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA; (L.C.); (Q.L.); (K.F.A.N.); (Y.X.); (B.D.); (S.P.)
| | - Bobin Deng
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA; (L.C.); (Q.L.); (K.F.A.N.); (Y.X.); (B.D.); (S.P.)
| | - Shuteng Niu
- Department of Computer Science, Bowling Green State University, Bowling Green, OH 43403, USA;
| | - Seyedamin Pouriyeh
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA; (L.C.); (Q.L.); (K.F.A.N.); (Y.X.); (B.D.); (S.P.)
| | - Zhiyu Dai
- Division of Pulmonary and Critical Care Medicine, John T. Milliken Department of Medicine, Washington University School of Medicine in St. Louis, St. Louis, MO 63110, USA;
| | - Jiawei Chen
- College of Computing, Data Science and Society, University of California, Berkeley, CA 94720, USA;
| | - Chloe Yixin Xie
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA; (L.C.); (Q.L.); (K.F.A.N.); (Y.X.); (B.D.); (S.P.)
| |
Collapse
|
14
|
Wirnsberger G, Pritišanac I, Oberdorfer G, Gruber K. Flattening the curve-How to get better results with small deep-mutational-scanning datasets. Proteins 2024; 92:886-902. [PMID: 38501649 DOI: 10.1002/prot.26686] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 02/24/2024] [Accepted: 03/07/2024] [Indexed: 03/20/2024]
Abstract
Proteins are used in various biotechnological applications, often requiring the optimization of protein properties by introducing specific amino-acid exchanges. Deep mutational scanning (DMS) is an effective high-throughput method for evaluating the effects of these exchanges on protein function. DMS data can then inform the training of a neural network to predict the impact of mutations. Most approaches use some representation of the protein sequence for training and prediction. As proteins are characterized by complex structures and intricate residue interaction networks, directly providing structural information as input reduces the need to learn these features from the data. We introduce a method for encoding protein structures as stacked 2D contact maps, which capture residue interactions, their evolutionary conservation, and mutation-induced interaction changes. Furthermore, we explored techniques to augment neural network training performance on smaller DMS datasets. To validate our approach, we trained three neural network architectures originally used for image analysis on three DMS datasets, and we compared their performances with networks trained solely on protein sequences. The results confirm the effectiveness of the protein structure encoding in machine learning efforts on DMS data. Using structural representations as direct input to the networks, along with data augmentation and pretraining, significantly reduced demands on training data size and improved prediction performance, especially on smaller datasets, while performance on large datasets was on par with state-of-the-art sequence convolutional neural networks. The methods presented here have the potential to provide the same workflow as DMS without the experimental and financial burden of testing thousands of mutants. Additionally, we present an open-source, user-friendly software tool to make these data analysis techniques accessible, particularly to biotechnology and protein engineering researchers who wish to apply them to their mutagenesis data.
Collapse
Affiliation(s)
| | - Iva Pritišanac
- Institute of Molecular Biology and Biochemistry, Medical University of Graz, Graz, Austria
- BioTechMed-Graz, Graz, Austria
| | - Gustav Oberdorfer
- BioTechMed-Graz, Graz, Austria
- Institute of Biochemistry, Graz University of Technology, Graz, Austria
| | - Karl Gruber
- Institute of Molecular Biosciences, University of Graz, Graz, Austria
- BioTechMed-Graz, Graz, Austria
- Field of Excellence BioHealth, University of Graz, Graz, Austria
| |
Collapse
|
15
|
De Coninck T, Gippert GP, Henrissat B, Desmet T, Van Damme EJM. Investigating diversity and similarity between CBM13 modules and ricin-B lectin domains using sequence similarity networks. BMC Genomics 2024; 25:643. [PMID: 38937673 PMCID: PMC11212257 DOI: 10.1186/s12864-024-10554-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Accepted: 06/24/2024] [Indexed: 06/29/2024] Open
Abstract
BACKGROUND The CBM13 family comprises carbohydrate-binding modules that occur mainly in enzymes and in several ricin-B lectins. The ricin-B lectin domain resembles the CBM13 module to a large extent. Historically, ricin-B lectins and CBM13 proteins were considered completely distinct, despite their structural and functional similarities. RESULTS In this data mining study, we investigate structural and functional similarities of these intertwined protein groups. Because of the high structural and functional similarities, and differences in nomenclature usage in several databases, confusion can arise. First, we demonstrate how public protein databases use different nomenclature systems to describe CBM13 modules and putative ricin-B lectin domains. We suggest the introduction of a novel CBM13 domain identifier, as well as the extension of CAZy cross-references in UniProt to guard the distinction between CAZy and non-CAZy entries in public databases. Since similar problems may occur with other lectin families and CBM families, we suggest the introduction of novel CBM InterPro domain identifiers to all existing CBM families. Second, we investigated phylogenetic, nomenclatural and structural similarities between putative ricin-B lectin domains and CBM13 modules, making use of sequence similarity networks. We concluded that the ricin-B/CBM13 superfamily may be larger than initially thought and that several putative ricin-B lectin domains may display CAZyme functionalities, although biochemical proof remains to be delivered. CONCLUSIONS Ricin-B lectin domains and CBM13 modules are associated groups of proteins whose database semantics are currently biased towards ricin-B lectins. Revision of the CAZy cross-reference in UniProt and introduction of a dedicated CBM13 domain identifier in InterPro may resolve this issue. In addition, our analyses show that several proteins with putative ricin-B lectin domains show very strong structural similarity to CBM13 modules. Therefore ricin-B lectin domains and CBM13 modules could be considered distant members of a larger ricin-B/CBM13 superfamily.
Collapse
Affiliation(s)
- Tibo De Coninck
- Laboratory of Biochemistry and Glycobiology, Department of Biotechnology, Ghent University, Proeftuinstraat 86, Ghent, 9000, Belgium
- Centre for Synthetic Biology, Department of Biotechnology, Ghent University, Coupure Links 653, Ghent, 9000, Belgium
| | - Garry P Gippert
- Section for Protein Chemistry and Enzyme Technology, Department of Biotechnology & Biomedicine, Technical University of Denmark, Søltofts Plads 224, Kgs. Lyngby, 2800, Denmark
| | - Bernard Henrissat
- Section for Protein Chemistry and Enzyme Technology, Department of Biotechnology & Biomedicine, Technical University of Denmark, Søltofts Plads 224, Kgs. Lyngby, 2800, Denmark
| | - Tom Desmet
- Centre for Synthetic Biology, Department of Biotechnology, Ghent University, Coupure Links 653, Ghent, 9000, Belgium
| | - Els J M Van Damme
- Laboratory of Biochemistry and Glycobiology, Department of Biotechnology, Ghent University, Proeftuinstraat 86, Ghent, 9000, Belgium.
| |
Collapse
|
16
|
Koper K, Han SW, Kothadia R, Salamon H, Yoshikuni Y, Maeda HA. Multisubstrate specificity shaped the complex evolution of the aminotransferase family across the tree of life. Proc Natl Acad Sci U S A 2024; 121:e2405524121. [PMID: 38885378 PMCID: PMC11214133 DOI: 10.1073/pnas.2405524121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2024] [Accepted: 05/14/2024] [Indexed: 06/20/2024] Open
Abstract
Aminotransferases (ATs) are an ancient enzyme family that play central roles in core nitrogen metabolism, essential to all organisms. However, many of the AT enzyme functions remain poorly defined, limiting our fundamental understanding of the nitrogen metabolic networks that exist in different organisms. Here, we traced the deep evolutionary history of the AT family by analyzing AT enzymes from 90 species spanning the tree of life (ToL). We found that each organism has maintained a relatively small and constant number of ATs. Mapping the distribution of ATs across the ToL uncovered that many essential AT reactions are carried out by taxon-specific AT enzymes due to wide-spread nonorthologous gene displacements. This complex evolutionary history explains the difficulty of homology-based AT functional prediction. Biochemical characterization of diverse aromatic ATs further revealed their broad substrate specificity, unlike other core metabolic enzymes that evolved to catalyze specific reactions today. Interestingly, however, we found that these AT enzymes that diverged over billion years share common signatures of multisubstrate specificity by employing different nonconserved active site residues. These findings illustrate that AT family enzymes had leveraged their inherent substrate promiscuity to maintain a small yet distinct set of multifunctional AT enzymes in different taxa. This evolutionary history of versatile ATs likely contributed to the establishment of robust and diverse nitrogen metabolic networks that exist throughout the ToL. The study provides a critical foundation to systematically determine diverse AT functions and underlying nitrogen metabolic networks across the ToL.
Collapse
Affiliation(s)
- Kaan Koper
- Department of Botany, University of Wisconsin-Madison, Madison, WI53706
| | - Sang-Woo Han
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720
- Department of Biotechnology, Konkuk University, Chungju27478, South Korea
| | - Ramani Kothadia
- The US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA94720
| | - Hugh Salamon
- The US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA94720
| | - Yasuo Yoshikuni
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720
- The US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA94720
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA94720
- Center for Advanced Bioenergy and Bioproducts Innovation, Lawrence Berkeley National Laboratory, Berkeley, CA94720
- Global Center for Food, Land, and Water Resources, Research Faculty of Agriculture, Hokkaido University, Hokkaido, Japan 060-8589
- Institute of Global Innovation Research, Tokyo University of Agriculture and Technology, Tokyo183-8538, Japan
| | - Hiroshi A. Maeda
- Department of Botany, University of Wisconsin-Madison, Madison, WI53706
| |
Collapse
|
17
|
Dahlström KM, Salminen TA. Apprehensions and emerging solutions in ML-based protein structure prediction. Curr Opin Struct Biol 2024; 86:102819. [PMID: 38631107 DOI: 10.1016/j.sbi.2024.102819] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 03/05/2024] [Accepted: 03/31/2024] [Indexed: 04/19/2024]
Abstract
The three-dimensional structure of proteins determines their function in vital biological processes. Thus, when the structure is known, the molecular mechanism of protein function can be understood in more detail and obtained information utilized in biotechnological, diagnostics, and therapeutic applications. Over the past five years, machine learning (ML)-based modeling has pushed protein structure prediction to the next level with AlphaFold in the front line, predicting the structure for hundreds of millions of proteins. Further advances recently report promising ML-based approaches for solving remaining challenges by incorporating functionally important metals, co-factors, post-translational modifications, structural dynamics, and interdomain and multimer interactions in the structure prediction process.
Collapse
Affiliation(s)
- Käthe M Dahlström
- Structural Bioinformatics Laboratory, Biochemistry, Faculty of Science and Engineering, Åbo Akademi University, Tykistökatu 6A, 20520 Turku, Finland; InFLAMES Research Flagship Center, Åbo Akademi University, 20520 Turku, Finland
| | - Tiina A Salminen
- Structural Bioinformatics Laboratory, Biochemistry, Faculty of Science and Engineering, Åbo Akademi University, Tykistökatu 6A, 20520 Turku, Finland; InFLAMES Research Flagship Center, Åbo Akademi University, 20520 Turku, Finland.
| |
Collapse
|
18
|
Yu Y, Trottmann NF, Schärer MR, Fenner K, Robinson SL. Substrate promiscuity of xenobiotic-transforming hydrolases from stream biofilms impacted by treated wastewater. WATER RESEARCH 2024; 256:121593. [PMID: 38631239 DOI: 10.1016/j.watres.2024.121593] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 04/07/2024] [Accepted: 04/08/2024] [Indexed: 04/19/2024]
Abstract
Organic contaminants enter aquatic ecosystems from various sources, including wastewater treatment plant effluent. Freshwater biofilms play a major role in the removal of organic contaminants from receiving water bodies, but knowledge of the molecular mechanisms driving contaminant biotransformations in complex stream biofilm (periphyton) communities remains limited. Previously, we demonstrated that biofilms in experimental flume systems grown at higher ratios of treated wastewater (WW) to stream water displayed an increased biotransformation potential for a number of organic contaminants. We identified a positive correlation between WW percentage and biofilm biotransformation rates for the widely-used insect repellent, N,N-diethyl-meta-toluamide (DEET) and a number of other wastewater-borne contaminants with hydrolyzable moieties. Here, we conducted deep shotgun sequencing of flume biofilms and identified a positive correlation between WW percentage and metagenomic read abundances of DEET hydrolase (DH) homologs. To test the causality of this association, we constructed a targeted metagenomic library of DH homologs from flume biofilms. We screened our complete metagenomic library for activity with four different substrates, including DEET, and a subset thereof with 183 WW-related organic compounds. The majority of active hydrolases in the metagenomic library preferred aliphatic and aromatic ester substrates while, remarkably, only a single reference enzyme was capable of DEET hydrolysis. Of the 626 total enzyme-substrate combinations tested, approximately 5% were active enzyme-substrate pairs. Metagenomic DH family homologs revealed a broad substrate promiscuity spanning 22 different compounds when summed across all enzymes tested. We biochemically characterized the most promiscuous and active enzymes identified based on metagenomic analysis from uncultivated Rhodospirillaceae and Planctomycetaceae. In addition to characterizing new DH family enzymes, we exemplified a framework for linking metagenome-guided hypothesis generation with experimental validation. Overall, this study expands the scope of known enzymatic contaminant biotransformations for metagenomic hydrolases from WW-receiving stream biofilm communities.
Collapse
Affiliation(s)
- Yaochun Yu
- Department of Environmental Chemistry, Swiss Federal Institute of Aquatic Science and Technology (Eawag), 8600 Dübendorf, Switzerland
| | - Niklas Ferenc Trottmann
- Department of Environmental Microbiology, Swiss Federal Institute of Aquatic Science and Technology (Eawag), 8600 Dübendorf, Switzerland
| | - Milo R Schärer
- Department of Environmental Microbiology, Swiss Federal Institute of Aquatic Science and Technology (Eawag), 8600 Dübendorf, Switzerland
| | - Kathrin Fenner
- Department of Environmental Chemistry, Swiss Federal Institute of Aquatic Science and Technology (Eawag), 8600 Dübendorf, Switzerland; Department of Chemistry, University of Zürich, 8057 Zürich, Switzerland
| | - Serina L Robinson
- Department of Environmental Microbiology, Swiss Federal Institute of Aquatic Science and Technology (Eawag), 8600 Dübendorf, Switzerland.
| |
Collapse
|
19
|
Partipilo M, Slotboom DJ. The S-component fold: a link between bacterial transporters and receptors. Commun Biol 2024; 7:610. [PMID: 38773269 PMCID: PMC11109136 DOI: 10.1038/s42003-024-06295-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Accepted: 05/06/2024] [Indexed: 05/23/2024] Open
Abstract
The processes of nutrient uptake and signal sensing are crucial for microbial survival and adaptation. Membrane-embedded proteins involved in these functions (transporters and receptors) are commonly regarded as unrelated in terms of sequence, structure, mechanism of action and evolutionary history. Here, we analyze the protein structural universe using recently developed artificial intelligence-based structure prediction tools, and find an unexpected link between prominent groups of microbial transporters and receptors. The so-called S-components of Energy-Coupling Factor (ECF) transporters, and the membrane domains of sensor histidine kinases of the 5TMR cluster share a structural fold. The discovery of their relatedness manifests a widespread case of prokaryotic "transceptors" (related proteins with transport or receptor function), showcases how artificial intelligence-based structure predictions reveal unchartered evolutionary connections between proteins, and provides new avenues for engineering transport and signaling functions in bacteria.
Collapse
Affiliation(s)
- Michele Partipilo
- Department of Biochemistry, Groningen Institute of Biomolecular Sciences & Biotechnology, University of Groningen, Nijenborgh 4, 9747 AG, Groningen, The Netherlands
| | - Dirk Jan Slotboom
- Department of Biochemistry, Groningen Institute of Biomolecular Sciences & Biotechnology, University of Groningen, Nijenborgh 4, 9747 AG, Groningen, The Netherlands.
| |
Collapse
|
20
|
Dohnálek V, Doležal P. Installation of LYRM proteins in early eukaryotes to regulate the metabolic capacity of the emerging mitochondrion. Open Biol 2024; 14:240021. [PMID: 38772414 PMCID: PMC11293456 DOI: 10.1098/rsob.240021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Accepted: 03/13/2024] [Indexed: 05/23/2024] Open
Abstract
Core mitochondrial processes such as the electron transport chain, protein translation and the formation of Fe-S clusters (ISC) are of prokaryotic origin and were present in the bacterial ancestor of mitochondria. In animal and fungal models, a family of small Leu-Tyr-Arg motif-containing proteins (LYRMs) uniformly regulates the function of mitochondrial complexes involved in these processes. The action of LYRMs is contingent upon their binding to the acylated form of acyl carrier protein (ACP). This study demonstrates that LYRMs are structurally and evolutionarily related proteins characterized by a core triplet of α-helices. Their widespread distribution across eukaryotes suggests that 12 specialized LYRMs were likely present in the last eukaryotic common ancestor to regulate the assembly and folding of the subunits that are conserved in bacteria but that lack LYRM homologues. The secondary reduction of mitochondria to anoxic environments has rendered the function of LYRMs and their interaction with acylated ACP dispensable. Consequently, these findings strongly suggest that early eukaryotes installed LYRMs in aerobic mitochondria as orchestrated switches, essential for regulating core metabolism and ATP production.
Collapse
Affiliation(s)
- Vít Dohnálek
- Department of Parasitology, Faculty of Science, Charles University, BIOCEV, Vestec252 50, Czech Republic
| | - Pavel Doležal
- Department of Parasitology, Faculty of Science, Charles University, BIOCEV, Vestec252 50, Czech Republic
| |
Collapse
|
21
|
Bou Dagher L, Madern D, Malbos P, Brochier-Armanet C. Persistent homology reveals strong phylogenetic signal in 3D protein structures. PNAS NEXUS 2024; 3:pgae158. [PMID: 38689707 PMCID: PMC11058471 DOI: 10.1093/pnasnexus/pgae158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Accepted: 04/01/2024] [Indexed: 05/02/2024]
Abstract
Changes that occur in proteins over time provide a phylogenetic signal that can be used to decipher their evolutionary history and the relationships between organisms. Sequence comparison is the most common way to access this phylogenetic signal, while those based on 3D structure comparisons are still in their infancy. In this study, we propose an effective approach based on Persistent Homology Theory (PH) to extract the phylogenetic information contained in protein structures. PH provides efficient and robust algorithms for extracting and comparing geometric features from noisy datasets at different spatial resolutions. PH has a growing number of applications in the life sciences, including the study of proteins (e.g. classification, folding). However, it has never been used to study the phylogenetic signal they may contain. Here, using 518 protein families, representing 22,940 protein sequences and structures, from 10 major taxonomic groups, we show that distances calculated with PH from protein structures correlate strongly with phylogenetic distances calculated from protein sequences, at both small and large evolutionary scales. We test several methods for calculating PH distances and propose some refinements to improve their relevance for addressing evolutionary questions. This work opens up new perspectives in evolutionary biology by proposing an efficient way to access the phylogenetic signal contained in protein structures, as well as future developments of topological analysis in the life sciences.
Collapse
Affiliation(s)
- Léa Bou Dagher
- Université Claude Bernard Lyon 1, CNRS, VetAgro Sup, Laboratoire de Biométrie et BiologieÉvolutive, UMR5558, F-69622 Villeurbanne, France
- Université Claude Bernard Lyon 1, CNRS, Institut Camille Jordan, UMR5208, F-69622 Villeurbanne, France
- Université Libanaise, Laboratoire de Mathématiques, École Doctorale en Science et Technologie, PO BOX 5 Hadath, Liban
| | - Dominique Madern
- University Grenoble Alpes, CEA, CNRS, IBS, 38000 Grenoble, France
| | - Philippe Malbos
- Université Claude Bernard Lyon 1, CNRS, Institut Camille Jordan, UMR5208, F-69622 Villeurbanne, France
| | - Céline Brochier-Armanet
- Université Claude Bernard Lyon 1, CNRS, VetAgro Sup, Laboratoire de Biométrie et BiologieÉvolutive, UMR5558, F-69622 Villeurbanne, France
| |
Collapse
|
22
|
Liu W, Wang Z, You R, Xie C, Wei H, Xiong Y, Yang J, Zhu S. PLMSearch: Protein language model powers accurate and fast sequence search for remote homology. Nat Commun 2024; 15:2775. [PMID: 38555371 PMCID: PMC10981738 DOI: 10.1038/s41467-024-46808-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2023] [Accepted: 03/08/2024] [Indexed: 04/02/2024] Open
Abstract
Homologous protein search is one of the most commonly used methods for protein annotation and analysis. Compared to structure search, detecting distant evolutionary relationships from sequences alone remains challenging. Here we propose PLMSearch (Protein Language Model), a homologous protein search method with only sequences as input. PLMSearch uses deep representations from a pre-trained protein language model and trains the similarity prediction model with a large number of real structure similarity. This enables PLMSearch to capture the remote homology information concealed behind the sequences. Extensive experimental results show that PLMSearch can search millions of query-target protein pairs in seconds like MMseqs2 while increasing the sensitivity by more than threefold, and is comparable to state-of-the-art structure search methods. In particular, unlike traditional sequence search methods, PLMSearch can recall most remote homology pairs with dissimilar sequences but similar structures. PLMSearch is freely available at https://dmiip.sjtu.edu.cn/PLMSearch .
Collapse
Affiliation(s)
- Wei Liu
- Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan University, 200433, Shanghai, China
| | - Ziye Wang
- Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan University, 200433, Shanghai, China
| | - Ronghui You
- Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan University, 200433, Shanghai, China
| | - Chenghan Xie
- School of Mathematical Sciences, Fudan University, 200433, Shanghai, China
| | - Hong Wei
- School of Mathematical Sciences, Nankai University, 300071, Tianjin, China
| | - Yi Xiong
- Department of Bioinformatics and Biostatistics, Shanghai Jiao Tong University, 200240, Shanghai, China
| | - Jianyi Yang
- Ministry of Education Frontiers Science Center for Nonlinear Expectations, Research Center for Mathematics and Interdisciplinary Science, Shandong University, 266237, Qingdao, China.
| | - Shanfeng Zhu
- Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan University, 200433, Shanghai, China.
- Shanghai Qi Zhi Institute, Shanghai, China.
- Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, Shanghai, China.
- Shanghai Key Lab of Intelligent Information Processing and Shanghai Institute of Artificial Intelligence Algorithm, Fudan University, Shanghai, China.
- Zhangjiang Fudan International Innovation Center, Shanghai, China.
| |
Collapse
|
23
|
Fierro Morales JC, Redfearn C, Titus MA, Roh-Johnson M. Reduced PaxillinB localization to cell-substrate adhesions promotes cell migration in Dictyostelium. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.19.585764. [PMID: 38562712 PMCID: PMC10983970 DOI: 10.1101/2024.03.19.585764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Many cells adhere to extracellular matrix for efficient cell migration. This adhesion is mediated by focal adhesions, a protein complex linking the extracellular matrix to the intracellular cytoskeleton. Focal adhesions have been studied extensively in mesenchymal cells, but recent research in physiological contexts and amoeboid cells suggest focal adhesion regulation differs from the mesenchymal focal adhesion paradigm. We used Dictyostelium discoideum to uncover new mechanisms of focal adhesion regulation, as Dictyostelium are amoeboid cells that form focal adhesion-like structures for migration. We show that PaxillinB, the Dictyostelium homologue of Paxillin, localizes to dynamic focal adhesion-like structures during Dictyostelium migration. Unexpectedly, reduced PaxillinB recruitment to these structures increases Dictyostelium cell migration. Quantitative analysis of focal adhesion size and dynamics show that lack of PaxillinB recruitment to focal adhesions does not alter focal adhesion size, but rather increases focal adhesion turnover. These findings are in direct contrast to Paxillin function at focal adhesions during mesenchymal migration, challenging the established focal adhesion model.
Collapse
Affiliation(s)
| | - Chandler Redfearn
- Department of Kinesiology, North Carolina Agricultural and Technical State University, Greensboro, NC 27411, USA
| | - Margaret A Titus
- Department of Genetics, Cell Biology, and Development, University of Minnesota, Minneapolis, MN 55455, USA
| | - Minna Roh-Johnson
- Department of Biochemistry, University of Utah, Salt Lake City, UT, 84112, USA
- Department of Kinesiology, North Carolina Agricultural and Technical State University, Greensboro, NC 27411, USA
- Department of Genetics, Cell Biology, and Development, University of Minnesota, Minneapolis, MN 55455, USA
| |
Collapse
|
24
|
Fischer AL, Tichy A, Kokot J, Hoerschinger VJ, Wild RF, Riccabona JR, Loeffler JR, Waibl F, Quoika PK, Gschwandtner P, Forli S, Ward AB, Liedl KR, Zacharias M, Fernández-Quintero ML. The Role of Force Fields and Water Models in Protein Folding and Unfolding Dynamics. J Chem Theory Comput 2024; 20:2321-2333. [PMID: 38373307 PMCID: PMC10938642 DOI: 10.1021/acs.jctc.3c01106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Revised: 01/29/2024] [Accepted: 01/29/2024] [Indexed: 02/21/2024]
Abstract
Protein folding is a fascinating, not fully understood phenomenon in biology. Molecular dynamics (MD) simulations are an invaluable tool to study conformational changes in atomistic detail, including folding and unfolding processes of proteins. However, the accuracy of the conformational ensembles derived from MD simulations inevitably relies on the quality of the underlying force field in combination with the respective water model. Here, we investigate protein folding, unfolding, and misfolding of fast-folding proteins by examining different force fields with their recommended water models, i.e., ff14SB with the TIP3P model and ff19SB with the OPC model. To this end, we generated long conventional MD simulations highlighting the perks and pitfalls of these setups. Using Markov state models, we defined kinetically independent conformational substates and emphasized their distinct characteristics, as well as their corresponding state probabilities. Surprisingly, we found substantial differences in thermodynamics and kinetics of protein folding, depending on the combination of the protein force field and water model, originating primarily from the different water models. These results emphasize the importance of carefully choosing the force field and the respective water model as they determine the accuracy of the observed dynamics of folding events. Thus, the findings support the hypothesis that the water model is at least equally important as the force field and hence needs to be considered in future studies investigating protein dynamics and folding in all areas of biophysics.
Collapse
Affiliation(s)
- Anna-Lena
M. Fischer
- Institute
for General, Inorganic and Theoretical Chemistry, Center for Molecular
Biosciences Innsbruck (CMBI), University
of Innsbruck, A-6020 Innsbruck, Austria
| | - Anna Tichy
- Institute
for General, Inorganic and Theoretical Chemistry, Center for Molecular
Biosciences Innsbruck (CMBI), University
of Innsbruck, A-6020 Innsbruck, Austria
| | - Janik Kokot
- Institute
for General, Inorganic and Theoretical Chemistry, Center for Molecular
Biosciences Innsbruck (CMBI), University
of Innsbruck, A-6020 Innsbruck, Austria
| | - Valentin J. Hoerschinger
- Institute
for General, Inorganic and Theoretical Chemistry, Center for Molecular
Biosciences Innsbruck (CMBI), University
of Innsbruck, A-6020 Innsbruck, Austria
| | - Robert F. Wild
- Institute
for General, Inorganic and Theoretical Chemistry, Center for Molecular
Biosciences Innsbruck (CMBI), University
of Innsbruck, A-6020 Innsbruck, Austria
| | - Jakob R. Riccabona
- Institute
for General, Inorganic and Theoretical Chemistry, Center for Molecular
Biosciences Innsbruck (CMBI), University
of Innsbruck, A-6020 Innsbruck, Austria
| | - Johannes R. Loeffler
- Institute
for General, Inorganic and Theoretical Chemistry, Center for Molecular
Biosciences Innsbruck (CMBI), University
of Innsbruck, A-6020 Innsbruck, Austria
| | - Franz Waibl
- Department
of Chemistry and Applied Biosciences, ETH
Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland
| | - Patrick K. Quoika
- Center
for Protein Assemblies (CPA), Physics Department, Chair of Theoretical
Biophysics, Technical University of Munich, D-80333 Munich, Germany
| | | | - Stefano Forli
- Department
of Integrative Structural and Computational Biology, Scripps Research Institute, La
Jolla, California 92037, United States
| | - Andrew B. Ward
- Department
of Integrative Structural and Computational Biology, Scripps Research Institute, La
Jolla, California 92037, United States
| | - Klaus R. Liedl
- Institute
for General, Inorganic and Theoretical Chemistry, Center for Molecular
Biosciences Innsbruck (CMBI), University
of Innsbruck, A-6020 Innsbruck, Austria
| | - Martin Zacharias
- Center
for Protein Assemblies (CPA), Physics Department, Chair of Theoretical
Biophysics, Technical University of Munich, D-80333 Munich, Germany
| | - Monica L. Fernández-Quintero
- Institute
for General, Inorganic and Theoretical Chemistry, Center for Molecular
Biosciences Innsbruck (CMBI), University
of Innsbruck, A-6020 Innsbruck, Austria
| |
Collapse
|
25
|
Tavis S, Hettich RL. Multi-Omics integration can be used to rescue metabolic information for some of the dark region of the Pseudomonas putida proteome. BMC Genomics 2024; 25:267. [PMID: 38468234 PMCID: PMC10926591 DOI: 10.1186/s12864-024-10082-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Accepted: 02/02/2024] [Indexed: 03/13/2024] Open
Abstract
In every omics experiment, genes or their products are identified for which even state of the art tools are unable to assign a function. In the biotechnology chassis organism Pseudomonas putida, these proteins of unknown function make up 14% of the proteome. This missing information can bias analyses since these proteins can carry out functions which impact the engineering of organisms. As a consequence of predicting protein function across all organisms, function prediction tools generally fail to use all of the types of data available for any specific organism, including protein and transcript expression information. Additionally, the release of Alphafold predictions for all Uniprot proteins provides a novel opportunity for leveraging structural information. We constructed a bespoke machine learning model to predict the function of recalcitrant proteins of unknown function in Pseudomonas putida based on these sources of data, which annotated 1079 terms to 213 proteins. Among the predicted functions supplied by the model, we found evidence for a significant overrepresentation of nitrogen metabolism and macromolecule processing proteins. These findings were corroborated by manual analyses of selected proteins which identified, among others, a functionally unannotated operon that likely encodes a branch of the shikimate pathway.
Collapse
Affiliation(s)
- Steven Tavis
- Genome Science and Technology Graduate Program, University of Tennessee Knoxville, Knoxville, USA
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Robert L Hettich
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA.
| |
Collapse
|
26
|
Iovino BG, Ye Y. Protein embedding based alignment. BMC Bioinformatics 2024; 25:85. [PMID: 38413857 PMCID: PMC10900708 DOI: 10.1186/s12859-024-05699-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Accepted: 02/12/2024] [Indexed: 02/29/2024] Open
Abstract
PURPOSE Despite the many progresses with alignment algorithms, aligning divergent protein sequences with less than 20-35% pairwise identity (so called "twilight zone") remains a difficult problem. Many alignment algorithms have been using substitution matrices since their creation in the 1970's to generate alignments, however, these matrices do not work well to score alignments within the twilight zone. We developed Protein Embedding based Alignments, or PEbA, to better align sequences with low pairwise identity. Similar to the traditional Smith-Waterman algorithm, PEbA uses a dynamic programming algorithm but the matching score of amino acids is based on the similarity of their embeddings from a protein language model. METHODS We tested PEbA on over twelve thousand benchmark pairwise alignments from BAliBASE, each one extracted from one of their multiple sequence alignments. Five different BAliBASE references were used, each with different sequence identities, motifs, and lengths, allowing PEbA to showcase how well it aligns under different circumstances. RESULTS PEbA greatly outperformed BLOSUM substitution matrix-based pairwise alignments, achieving different levels of improvements of the alignment quality for pairs of sequences with different levels of similarity (over four times as well for pairs of sequences with <10% identity). We also compared PEbA with embeddings generated by different protein language models (ProtT5 and ESM-2) and found that ProtT5-XL-U50 produced the most useful embeddings for aligning protein sequences. PEbA also outperformed DEDAL and vcMSA, two recently developed protein language model embedding-based alignment methods. CONCLUSION Our results suggested that general purpose protein language models provide useful contextual information for generating more accurate protein alignments than typically used methods.
Collapse
Affiliation(s)
- Benjamin Giovanni Iovino
- Luddy School of Informatics, Computing and Engineering, Indiana University, 700 N. Woodlawn Avenue, Bloomington, IN, 47408, USA
| | - Yuzhen Ye
- Luddy School of Informatics, Computing and Engineering, Indiana University, 700 N. Woodlawn Avenue, Bloomington, IN, 47408, USA.
| |
Collapse
|
27
|
Wirbel J, Bhatt AS, Probst AJ. The journey to understand previously unknown microbial genes. Nature 2024; 626:267-269. [PMID: 38291331 DOI: 10.1038/d41586-024-00077-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2024]
|
28
|
van Kempen M, Kim SS, Tumescheit C, Mirdita M, Lee J, Gilchrist CLM, Söding J, Steinegger M. Fast and accurate protein structure search with Foldseek. Nat Biotechnol 2024; 42:243-246. [PMID: 37156916 PMCID: PMC10869269 DOI: 10.1038/s41587-023-01773-0] [Citation(s) in RCA: 447] [Impact Index Per Article: 447.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2022] [Accepted: 03/30/2023] [Indexed: 05/10/2023]
Abstract
As structure prediction methods are generating millions of publicly available protein structures, searching these databases is becoming a bottleneck. Foldseek aligns the structure of a query protein against a database by describing tertiary amino acid interactions within proteins as sequences over a structural alphabet. Foldseek decreases computation times by four to five orders of magnitude with 86%, 88% and 133% of the sensitivities of Dali, TM-align and CE, respectively.
Collapse
Affiliation(s)
- Michel van Kempen
- Quantitative and Computational Biology Group, Max Planck Institute for Multidisciplinary Sciences, Göttingen, Germany
| | - Stephanie S Kim
- School of Biological Sciences, Seoul National University, Seoul, South Korea
| | | | - Milot Mirdita
- Quantitative and Computational Biology Group, Max Planck Institute for Multidisciplinary Sciences, Göttingen, Germany
- School of Biological Sciences, Seoul National University, Seoul, South Korea
| | - Jeongjae Lee
- School of Biological Sciences, Seoul National University, Seoul, South Korea
| | | | - Johannes Söding
- Quantitative and Computational Biology Group, Max Planck Institute for Multidisciplinary Sciences, Göttingen, Germany.
- Campus Institute Data Science (CIDAS), Göttingen, Germany.
| | - Martin Steinegger
- School of Biological Sciences, Seoul National University, Seoul, South Korea.
- Artificial Intelligence Institute, Seoul National University, Seoul, South Korea.
- Institute of Molecular Biology and Genetics, Seoul National University, Seoul, South Korea.
| |
Collapse
|
29
|
Svedberg D, Winiger RR, Berg A, Sharma H, Tellgren-Roth C, Debrunner-Vossbrinck BA, Vossbrinck CR, Barandun J. Functional annotation of a divergent genome using sequence and structure-based similarity. BMC Genomics 2024; 25:6. [PMID: 38166563 PMCID: PMC10759460 DOI: 10.1186/s12864-023-09924-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2023] [Accepted: 12/18/2023] [Indexed: 01/04/2024] Open
Abstract
BACKGROUND Microsporidia are a large taxon of intracellular pathogens characterized by extraordinarily streamlined genomes with unusually high sequence divergence and many species-specific adaptations. These unique factors pose challenges for traditional genome annotation methods based on sequence similarity. As a result, many of the microsporidian genomes sequenced to date contain numerous genes of unknown function. Recent innovations in rapid and accurate structure prediction and comparison, together with the growing amount of data in structural databases, provide new opportunities to assist in the functional annotation of newly sequenced genomes. RESULTS In this study, we established a workflow that combines sequence and structure-based functional gene annotation approaches employing a ChimeraX plugin named ANNOTEX (Annotation Extension for ChimeraX), allowing for visual inspection and manual curation. We employed this workflow on a high-quality telomere-to-telomere sequenced tetraploid genome of Vairimorpha necatrix. First, the 3080 predicted protein-coding DNA sequences, of which 89% were confirmed with RNA sequencing data, were used as input. Next, ColabFold was used to create protein structure predictions, followed by a Foldseek search for structural matching to the PDB and AlphaFold databases. The subsequent manual curation, using sequence and structure-based hits, increased the accuracy and quality of the functional genome annotation compared to results using only traditional annotation tools. Our workflow resulted in a comprehensive description of the V. necatrix genome, along with a structural summary of the most prevalent protein groups, such as the ricin B lectin family. In addition, and to test our tool, we identified the functions of several previously uncharacterized Encephalitozoon cuniculi genes. CONCLUSION We provide a new functional annotation tool for divergent organisms and employ it on a newly sequenced, high-quality microsporidian genome to shed light on this uncharacterized intracellular pathogen of Lepidoptera. The addition of a structure-based annotation approach can serve as a valuable template for studying other microsporidian or similarly divergent species.
Collapse
Affiliation(s)
- Dennis Svedberg
- Department of Molecular Biology, The Laboratory for Molecular Infection Medicine Sweden (MIMS), Science for Life Laboratory, Umeå Centre for Microbial Research (UCMR), Umeå University, Umeå, 90187, Sweden
- Department of Medical Biochemistry and Biophysics, Umeå University, Umeå, 90736, Sweden
| | - Rahel R Winiger
- Department of Molecular Biology, The Laboratory for Molecular Infection Medicine Sweden (MIMS), Science for Life Laboratory, Umeå Centre for Microbial Research (UCMR), Umeå University, Umeå, 90187, Sweden
| | - Alexandra Berg
- Department of Molecular Biology, The Laboratory for Molecular Infection Medicine Sweden (MIMS), Science for Life Laboratory, Umeå Centre for Microbial Research (UCMR), Umeå University, Umeå, 90187, Sweden
- Department of Medical Biochemistry and Biophysics, Umeå University, Umeå, 90736, Sweden
| | - Himanshu Sharma
- Department of Molecular Biology, The Laboratory for Molecular Infection Medicine Sweden (MIMS), Science for Life Laboratory, Umeå Centre for Microbial Research (UCMR), Umeå University, Umeå, 90187, Sweden
- Department of Medical Biochemistry and Biophysics, Umeå University, Umeå, 90736, Sweden
| | - Christian Tellgren-Roth
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | | | - Charles R Vossbrinck
- Department of Environmental Science, Connecticut Agricultural Experiment Station, New Haven, CT, 06504, USA
| | - Jonas Barandun
- Department of Molecular Biology, The Laboratory for Molecular Infection Medicine Sweden (MIMS), Science for Life Laboratory, Umeå Centre for Microbial Research (UCMR), Umeå University, Umeå, 90187, Sweden.
| |
Collapse
|
30
|
Jaito N, Kaewsawat N, Phetlum S, Uengwetwanit T. Metagenomic discovery of lipases with predicted structural similarity to Candida antarctica lipase B. PLoS One 2023; 18:e0295397. [PMID: 38055755 PMCID: PMC10699602 DOI: 10.1371/journal.pone.0295397] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 11/21/2023] [Indexed: 12/08/2023] Open
Abstract
Here we employed sequence-based and structure-based screening for prospecting lipases that have structural homolog to Candida antarctica lipase B (CalB). CalB, a widely used biocatalyst, was used as structural template reference because of its enzymatic properties. Structural homolog could aid in the discovery of novel wild-type enzymes with desirable features and serve as a scaffold for further biocatalyst design. The available metagenomic data isolated from various environments was leveraged as a source for bioprospecting. We identified two bacteria lipases that showed high structural similarity to CalB with <40% sequence identity. Partial purification was conducted. In comparison to CalB, the enzymatic characteristics of two potential lipases were examined. A candidate exhibited optimal pH of 8 and temperature of 50°C similar to CalB. The second lipase candidate demonstrated an optimal pH of 8 and a higher optimal temperature of 55°C. Notably, this candidate sustained considerable activity at extreme conditions, maintaining high activity at 70°C or pH 9, contrasting with the diminished activity of CalB under similar conditions. Further comprehensive experimentation is warranted to uncover and exploit these novel enzymatic properties for practical biotechnological purposes.
Collapse
Affiliation(s)
- Nongluck Jaito
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand
| | - Nattha Kaewsawat
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand
| | - Suthathip Phetlum
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand
| | - Tanaporn Uengwetwanit
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand
| |
Collapse
|
31
|
Robinson SL. Structure-guided metagenome mining to tap microbial functional diversity. Curr Opin Microbiol 2023; 76:102382. [PMID: 37741262 DOI: 10.1016/j.mib.2023.102382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2023] [Revised: 05/21/2023] [Accepted: 08/22/2023] [Indexed: 09/25/2023]
Abstract
Scientists now have access to millions of accurate three-dimensional (3D) models of protein structures. How do we leverage 3D structural models to learn about microbial functions encoded in metagenomes? Here, we review recent developments using protein structural features to mine metagenomes from diverse environments ranging from the human gut to soil and ocean viromes. We compare 3D protein structural methods to characterize antibiotic resistance phenotypes, nutrient cycling, and host-drug-microbe interactions. Broadly, we encourage the scientific community to look beyond global sequence and structure alignments by considering fine-grained descriptors such as distance to ligand, active site, and tertiary interactions between amino acid residues scaling to microbiomes. Finally, we highlight structure-inspired approaches to chart new areas of microbial protein-coding sequence space.
Collapse
Affiliation(s)
- Serina L Robinson
- Department of Environmental Microbiology, Eawag, Swiss Federal Institute of Aquatic Science and Technology, Ueberlandstrasse 133, 8600 Dübendorf, Switzerland.
| |
Collapse
|
32
|
Zhang S, Zhang T, Fu Y. Proteome-wide structural analysis quantifies structural conservation across distant species. Genome Res 2023; 33:1975-1993. [PMID: 37993136 PMCID: PMC10760455 DOI: 10.1101/gr.277771.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Accepted: 10/16/2023] [Indexed: 11/24/2023]
Abstract
Traditional evolutionary biology research mainly relies on sequence information to infer evolutionary relationships between genes or proteins. In contrast, protein structural information has long been overlooked, although structures are more conserved and closely linked to the functions than the sequences. To address this gap, we conducted a proteome-wide structural analysis using experimental and computed protein structures for organisms from the three distinct domains, including Homo sapiens (eukarya), Escherichia coli (bacteria), and Methanocaldococcus jannaschii (archaea). We reveal the distribution of structural similarity and sequence identity at the genomic level and characterize the twilight zone, where signals obtained from sequence alignment are blurred and evolutionary relationships cannot be inferred unambiguously. We find that structurally similar homologous protein pairs in the twilight zone account for ∼0.004%-0.021% of all possible protein pair combinations, which translates to ∼8%-32% of the protein-coding genes, depending on the species under comparison. In addition, by comparing the structural homologs, we show that human proteins involved in the energy supply are more similar to their E. coli homologs, whereas proteins relating to the central dogma are more similar to their M. jannaschii homologs. We also identify a bacterial GPCR homolog in the E. coli proteome that displays distinctive domain architecture. Our results shed light on the characteristics of the twilight zone and the origin of different pathways from a protein structure perspective, highlighting an exciting new frontier in evolutionary biology.
Collapse
Affiliation(s)
- Shijie Zhang
- Department of Pharmacology and Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Teng Zhang
- Department of Pharmacology and Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Yuan Fu
- Department of Pharmacology and Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| |
Collapse
|
33
|
Boohar RT, Vandepas LE, Traylor-Knowles N, Browne WE. Phylogenetic and Protein Structure Analyses Provide Insight into the Evolution and Diversification of the CD36 Domain "Apex" among Scavenger Receptor Class B Proteins across Eukarya. Genome Biol Evol 2023; 15:evad218. [PMID: 38035778 PMCID: PMC10715195 DOI: 10.1093/gbe/evad218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Revised: 11/07/2023] [Accepted: 11/24/2023] [Indexed: 12/02/2023] Open
Abstract
The cluster of differentiation 36 (CD36) domain defines the characteristic ectodomain associated with class B scavenger receptor (SR-B) proteins. In bilaterians, SR-Bs play critical roles in diverse biological processes including innate immunity functions such as pathogen recognition and apoptotic cell clearance, as well as metabolic sensing associated with fatty acid uptake and cholesterol transport. Although previous studies suggest this protein family is ancient, SR-B diversity across Eukarya has not been robustly characterized. We analyzed SR-B homologs identified from the genomes and transcriptomes of 165 diverse eukaryotic species. The presence of highly conserved amino acid motifs across major eukaryotic supergroups supports the presence of a SR-B homolog in the last eukaryotic common ancestor. Our comparative analyses of SR-B protein structure identify the retention of a canonical asymmetric beta barrel tertiary structure within the CD36 ectodomain across Eukarya. We also identify multiple instances of independent lineage-specific sequence expansions in the apex region of the CD36 ectodomain-a region functionally associated with ligand-sensing. We hypothesize that a combination of both sequence expansion and structural variation in the CD36 apex region may reflect the evolution of SR-B ligand-sensing specificity between diverse eukaryotic clades.
Collapse
Affiliation(s)
- Reed T Boohar
- Department of Biology, University of Miami, Coral Gables, Florida, USA
| | - Lauren E Vandepas
- Department of Biology, University of Miami, Coral Gables, Florida, USA
| | - Nikki Traylor-Knowles
- Department of Marine Biology and Ecology, Rosenstiel School of Marine and Atmospheric Science, University of Miami, Miami, Florida, USA
| | - William E Browne
- Department of Biology, University of Miami, Coral Gables, Florida, USA
| |
Collapse
|
34
|
Himmel NJ, Moi D, Benton R. Remote homolog detection places insect chemoreceptors in a cryptic protein superfamily spanning the tree of life. Curr Biol 2023; 33:5023-5033.e4. [PMID: 37913770 DOI: 10.1016/j.cub.2023.10.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 09/26/2023] [Accepted: 10/06/2023] [Indexed: 11/03/2023]
Abstract
Many proteins exist in the so-called "twilight zone" of sequence alignment, where low pairwise sequence identity makes it difficult to determine homology and phylogeny.1,2 As protein tertiary structure is often more conserved,3 recent advances in ab initio protein folding have made structure-based identification of putative homologs feasible.4,5,6 We present a pipeline for the identification and characterization of distant homologs and apply it to 7-transmembrane-domain ion channels (7TMICs), a protein group founded by insect odorant and gustatory receptors. Previous sequence and limited structure-based searches identified putatively related proteins, mainly in other animals and plants.7,8,9,10 However, very few 7TMICs have been identified in non-animal, non-plant taxa. Moreover, these proteins' remarkable sequence dissimilarity made it uncertain whether disparate 7TMIC types (Gr/Or, Grl, GRL, DUF3537, PHTF, and GrlHz) are homologous or convergent, leaving their evolutionary history unresolved. Our pipeline identified thousands of new 7TMICs in archaea, bacteria, and unicellular eukaryotes. Using graph-based analyses and protein language models to extract family-wide signatures, we demonstrate that 7TMICs have structure and sequence similarity, supporting homology. Through sequence- and structure-based phylogenetics, we classify eukaryotic 7TMICs into two families (Class-A and Class-B), which are the result of a gene duplication predating the split(s) leading to Amorphea (animals, fungi, and allies) and Diaphoretickes (plants and allies). Our work reveals 7TMICs as a cryptic superfamily, with origins close to the evolution of cellular life. More generally, this study serves as a methodological proof of principle for the identification of extremely distant protein homologs.
Collapse
Affiliation(s)
- Nathaniel J Himmel
- Center for Integrative Genomics, Faculty of Biology and Medicine, University of Lausanne, 1015 Lausanne, Switzerland.
| | - David Moi
- Department of Computational Biology, Faculty of Biology and Medicine, University of Lausanne, 1015 Lausanne, Switzerland
| | - Richard Benton
- Center for Integrative Genomics, Faculty of Biology and Medicine, University of Lausanne, 1015 Lausanne, Switzerland.
| |
Collapse
|
35
|
Bastolla U, Abia D, Piette O. PC_ali: a tool for improved multiple alignments and evolutionary inference based on a hybrid protein sequence and structure similarity score. Bioinformatics 2023; 39:btad630. [PMID: 37847775 PMCID: PMC10628387 DOI: 10.1093/bioinformatics/btad630] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Revised: 08/01/2023] [Accepted: 10/17/2023] [Indexed: 10/19/2023] Open
Abstract
MOTIVATION Evolutionary inference depends crucially on the quality of multiple sequence alignments (MSA), which is problematic for distantly related proteins. Since protein structure is more conserved than sequence, it seems natural to use structure alignments for distant homologs. However, structure alignments may not be suitable for inferring evolutionary relationships. RESULTS Here we examined four protein similarity measures that depend on sequence and structure (fraction of aligned residues, sequence identity, fraction of superimposed residues, and contact overlap), finding that they are intimately correlated but none of them provides a complete and unbiased picture of conservation in proteins. Therefore, we propose the new hybrid protein sequence and structure similarity score PC_sim based on their main principal component. The corresponding divergence measure PC_div shows the strongest correlation with divergences obtained from individual similarities, suggesting that it infers accurate evolutionary divergences. We developed the program PC_ali that constructs protein MSAs either de novo or modifying an input MSA, using a similarity matrix based on PC_sim. The program constructs a starting MSA based on the maximal cliques of the graph of these PAs and it refines it through progressive alignments along the tree reconstructed with PC_div. Compared with eight state-of-the-art multiple structure or sequence alignment tools, PC_ali achieves higher or equal aligned fraction and structural scores, sequence identity higher than structure aligners although lower than sequence aligners, highest score PC_sim, and highest similarity with the MSAs produced by other tools and with the reference MSA Balibase. AVAILABILITY AND IMPLEMENTATION https://github.com/ugobas/PC_ali.
Collapse
Affiliation(s)
- Ugo Bastolla
- Centro de Biologia Molecular “Severo Ochoa” (CBMSO), CSIC-UAM Cantoblanco, 28049 Madrid, Spain
| | - David Abia
- Bioinformatics Facility CBMSO, CSIC-UAM Cantoblanco, 28049 Madrid, Spain
| | - Oscar Piette
- Centro de Biologia Molecular “Severo Ochoa” (CBMSO), CSIC-UAM Cantoblanco, 28049 Madrid, Spain
| |
Collapse
|
36
|
Malik AJ, Langer D, Verma CS, Poole AM, Allison JR. Structome: a tool for the rapid assembly of datasets for structural phylogenetics. BIOINFORMATICS ADVANCES 2023; 3:vbad134. [PMID: 38046099 PMCID: PMC10692761 DOI: 10.1093/bioadv/vbad134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Revised: 08/17/2023] [Accepted: 09/29/2023] [Indexed: 12/05/2023]
Abstract
Summary Protein structures carry signal of common ancestry and can therefore aid in reconstructing their evolutionary histories. To expedite the structure-informed inference process, a web server, Structome, has been developed that allows users to rapidly identify protein structures similar to a query protein and to assemble datasets useful for structure-based phylogenetics. Structome was created by clustering ∼ 94 % of the structures in RCSB PDB using 90% sequence identity and representing each cluster by a centroid structure. Structure similarity between centroid proteins was calculated, and annotations from PDB, SCOP, and CATH were integrated. To illustrate utility, an H3 histone was used as a query, and results show that the protein structures returned by Structome span both sequence and structural diversity of the histone fold. Additionally, the pre-computed nexus-formatted distance matrix, provided by Structome, enables analysis of evolutionary relationships between proteins not identifiable using searches based on sequence similarity alone. Our results demonstrate that, beginning with a single structure, Structome can be used to rapidly generate a dataset of structural neighbours and allows deep evolutionary history of proteins to be studied. Availability and Implementation Structome is available at: https://structome.bii.a-star.edu.sg.
Collapse
Affiliation(s)
- Ashar J Malik
- Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), 138671 Singapore
| | - Desiree Langer
- School of Biological Sciences, University of Auckland, 1142 Auckland, New Zealand
| | - Chandra S Verma
- Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), 138671 Singapore
- Department of Biological Sciences, National University of Singapore, 117543 Singapore
- School of Biological Sciences, Nanyang Technological University, 637551 Singapore
| | - Anthony M Poole
- School of Biological Sciences, University of Auckland, 1142 Auckland, New Zealand
- Digital Life Institute, University of Auckland, Auckland 1142, New Zealand
| | - Jane R Allison
- School of Biological Sciences, University of Auckland, 1142 Auckland, New Zealand
- Digital Life Institute, University of Auckland, Auckland 1142, New Zealand
- Maurice Wilkins Centre for Molecular Biodiscovery, University of Auckland, 1142 Auckland, New Zealand
- Biomolecular Interaction Centre, University of Canterbury, 8041 Christchurch, New Zealand
| |
Collapse
|
37
|
Truong A, Myerscough D, Campbell I, Atkinson J, Silberg JJ. A cellular selection identifies elongated flavodoxins that support electron transfer to sulfite reductase. Protein Sci 2023; 32:e4746. [PMID: 37551563 PMCID: PMC10503412 DOI: 10.1002/pro.4746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2023] [Revised: 07/17/2023] [Accepted: 08/04/2023] [Indexed: 08/09/2023]
Abstract
Flavodoxins (Flds) mediate the flux of electrons between oxidoreductases in diverse metabolic pathways. To investigate whether Flds can support electron transfer to a sulfite reductase (SIR) that evolved to couple with a ferredoxin, we evaluated the ability of Flds to transfer electrons from a ferredoxin-NADP reductase (FNR) to a ferredoxin-dependent SIR using growth complementation of an Escherichia coli strain with a sulfur metabolism defect. We show that Flds from cyanobacteria complement this growth defect when coexpressed with an FNR and an SIR that evolved to couple with a plant ferredoxin. When we evaluated the effect of peptide insertion on Fld-mediated electron transfer, we observed a sensitivity to insertions within regions predicted to be proximal to the cofactor and partner binding sites, while a high insertion tolerance was detected within loops distal from the cofactor and within regions of helices and sheets that are proximal to those loops. Bioinformatic analysis showed that natural Fld sequence variability predicts a large fraction of the motifs that tolerate insertion of the octapeptide SGRPGSLS. These results represent the first evidence that Flds can support electron transfer to assimilatory SIRs, and they suggest that the pattern of insertion tolerance is influenced by interactions with oxidoreductase partners.
Collapse
Affiliation(s)
- Albert Truong
- Biochemistry and Cell Biology Graduate ProgramRice UniversityHoustonTexasUSA
- Department of BiosciencesRice UniversityHoustonTexasUSA
| | | | - Ian Campbell
- Department of BiosciencesRice UniversityHoustonTexasUSA
| | | | - Jonathan J. Silberg
- Department of BiosciencesRice UniversityHoustonTexasUSA
- Department of BioengineeringRice UniversityHoustonTexasUSA
- Department of Chemical and Biomolecular EngineeringRice UniversityHoustonTexasUSA
| |
Collapse
|
38
|
Koch TL, Torres JP, Baskin RP, Salcedo PF, Chase K, Olivera BM, Safavi-Hemami H. A toxin-based approach to neuropeptide and peptide hormone discovery. Front Mol Neurosci 2023; 16:1176662. [PMID: 37720554 PMCID: PMC10501145 DOI: 10.3389/fnmol.2023.1176662] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Accepted: 08/15/2023] [Indexed: 09/19/2023] Open
Abstract
Peptide hormones and neuropeptides form a diverse class of bioactive secreted molecules that control essential processes in animals. Despite breakthroughs in peptide discovery, many signaling peptides remain undiscovered. Recently, we demonstrated the use of somatostatin-mimicking toxins from cone snails to identify the invertebrate ortholog of somatostatin. Here, we show that this toxin-based approach can be systematically applied to discover other unknown secretory peptides that are likely to have signaling function. Using large sequencing datasets, we searched for homologies between cone snail toxins and secreted proteins from the snails' prey. We identified and confirmed expression of five toxin families that share strong similarities with unknown secretory peptides from mollusks and annelids and in one case also from ecdysozoans. Based on several lines of evidence we propose that these peptides likely act as signaling peptides that serve important physiological functions. Indeed, we confirmed that one of the identified peptides belongs to the family of crustacean hyperglycemic hormone, a peptide not previously observed in Spiralia. We propose that this discovery pipeline can be broadly applied to other systems in which one organism has evolved molecules to manipulate the physiology of another.
Collapse
Affiliation(s)
- Thomas Lund Koch
- Department of Biomedical Sciences, University of Copenhagen, Copenhagen, Denmark
- Department of Biochemistry, University of Utah, Salt Lake City, UT, United States
| | - Joshua P. Torres
- Department of Biomedical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Robert P. Baskin
- School of Biological Sciences, University of Utah, Salt Lake City, UT, United States
- The Ohio State University College of Medicine, Columbus, OH, United States
| | - Paula Flórez Salcedo
- Department of Neurobiology, University of Utah, Salt Lake City, UT, United States
| | - Kevin Chase
- School of Biological Sciences, University of Utah, Salt Lake City, UT, United States
| | - Baldomero M. Olivera
- School of Biological Sciences, University of Utah, Salt Lake City, UT, United States
| | - Helena Safavi-Hemami
- Department of Biomedical Sciences, University of Copenhagen, Copenhagen, Denmark
- Department of Biochemistry, University of Utah, Salt Lake City, UT, United States
- School of Biological Sciences, University of Utah, Salt Lake City, UT, United States
| |
Collapse
|
39
|
Suskiewicz MJ, Munnur D, Strømland Ø, Yang JC, Easton L, Chatrin C, Zhu K, Baretić D, Goffinont S, Schuller M, Wu WF, Elkins J, Ahel D, Sanyal S, Neuhaus D, Ahel I. Updated protein domain annotation of the PARP protein family sheds new light on biological function. Nucleic Acids Res 2023; 51:8217-8236. [PMID: 37326024 PMCID: PMC10450202 DOI: 10.1093/nar/gkad514] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 05/09/2023] [Accepted: 06/03/2023] [Indexed: 06/17/2023] Open
Abstract
AlphaFold2 and related computational tools have greatly aided studies of structural biology through their ability to accurately predict protein structures. In the present work, we explored AF2 structural models of the 17 canonical members of the human PARP protein family and supplemented this analysis with new experiments and an overview of recent published data. PARP proteins are typically involved in the modification of proteins and nucleic acids through mono or poly(ADP-ribosyl)ation, but this function can be modulated by the presence of various auxiliary protein domains. Our analysis provides a comprehensive view of the structured domains and long intrinsically disordered regions within human PARPs, offering a revised basis for understanding the function of these proteins. Among other functional insights, the study provides a model of PARP1 domain dynamics in the DNA-free and DNA-bound states and enhances the connection between ADP-ribosylation and RNA biology and between ADP-ribosylation and ubiquitin-like modifications by predicting putative RNA-binding domains and E2-related RWD domains in certain PARPs. In line with the bioinformatic analysis, we demonstrate for the first time PARP14's RNA-binding capability and RNA ADP-ribosylation activity in vitro. While our insights align with existing experimental data and are probably accurate, they need further validation through experiments.
Collapse
Affiliation(s)
| | - Deeksha Munnur
- Sir William Dunn School of Pathology, University of Oxford, Oxford OX1 3RE, UK
| | - Øyvind Strømland
- Sir William Dunn School of Pathology, University of Oxford, Oxford OX1 3RE, UK
- Department of Biomedicine, University of Bergen, Bergen, Norway
| | - Ji-Chun Yang
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, UK
| | - Laura E Easton
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, UK
| | - Chatrin Chatrin
- Sir William Dunn School of Pathology, University of Oxford, Oxford OX1 3RE, UK
| | - Kang Zhu
- Sir William Dunn School of Pathology, University of Oxford, Oxford OX1 3RE, UK
| | - Domagoj Baretić
- Sir William Dunn School of Pathology, University of Oxford, Oxford OX1 3RE, UK
| | | | - Marion Schuller
- Sir William Dunn School of Pathology, University of Oxford, Oxford OX1 3RE, UK
| | - Wing-Fung Wu
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, UK
| | - Jonathan M Elkins
- Centre for Medicines Discovery, University of Oxford, Oxford OX3 7DQ, UK
| | - Dragana Ahel
- Sir William Dunn School of Pathology, University of Oxford, Oxford OX1 3RE, UK
| | - Sumana Sanyal
- Sir William Dunn School of Pathology, University of Oxford, Oxford OX1 3RE, UK
| | - David Neuhaus
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, UK
| | - Ivan Ahel
- Sir William Dunn School of Pathology, University of Oxford, Oxford OX1 3RE, UK
| |
Collapse
|
40
|
Cretin G, Périn C, Zimmermann N, Galochkina T, Gelly JC. ICARUS: flexible protein structural alignment based on Protein Units. Bioinformatics 2023; 39:btad459. [PMID: 37498544 PMCID: PMC10400377 DOI: 10.1093/bioinformatics/btad459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Revised: 07/04/2023] [Accepted: 07/26/2023] [Indexed: 07/28/2023] Open
Abstract
MOTIVATION Alignment of protein structures is a major problem in structural biology. The first approach commonly used is to consider proteins as rigid bodies. However, alignment of protein structures can be very complex due to conformational variability, or complex evolutionary relationships between proteins such as insertions, circular permutations or repetitions. In such cases, introducing flexibility becomes useful for two reasons: (i) it can help compare two protein chains which adopted two different conformational states, such as due to proteins/ligands interaction or post-translational modifications, and (ii) it aids in the identification of conserved regions in proteins that may have distant evolutionary relationships. RESULTS We propose ICARUS, a new approach for flexible structural alignment based on identification of Protein Units, evolutionarily preserved structural descriptors of intermediate size, between secondary structures and domains. ICARUS significantly outperforms reference methods on a dataset of very difficult structural alignments. AVAILABILITY AND IMPLEMENTATION Code is freely available online at https://github.com/DSIMB/ICARUS.
Collapse
Affiliation(s)
- Gabriel Cretin
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75015 Paris, France
- Laboratoire d’Excellence GR-Ex, 75015 Paris, France
| | - Charlotte Périn
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75015 Paris, France
- Laboratoire d’Excellence GR-Ex, 75015 Paris, France
- TBI, Université de Toulouse, CNRS, INRAE, INSA, 31077 Toulouse, France
| | - Nicolas Zimmermann
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75015 Paris, France
- Laboratoire d’Excellence GR-Ex, 75015 Paris, France
| | - Tatiana Galochkina
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75015 Paris, France
- Laboratoire d’Excellence GR-Ex, 75015 Paris, France
| | - Jean-Christophe Gelly
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75015 Paris, France
- Laboratoire d’Excellence GR-Ex, 75015 Paris, France
| |
Collapse
|
41
|
Kakoulidis P, Vlachos IS, Thanos D, Blatch GL, Emiris IZ, Anastasiadou E. Identifying and profiling structural similarities between Spike of SARS-CoV-2 and other viral or host proteins with Machaon. Commun Biol 2023; 6:752. [PMID: 37468602 PMCID: PMC10356814 DOI: 10.1038/s42003-023-05076-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Accepted: 06/26/2023] [Indexed: 07/21/2023] Open
Abstract
Using protein structure to predict function, interactions, and evolutionary history is still an open challenge, with existing approaches relying extensively on protein homology and families. Here, we present Machaon, a data-driven method combining orientation invariant metrics on phi-psi angles, inter-residue contacts and surface complexity. It can be readily applied on whole structures or segments-such as domains and binding sites. Machaon was applied on SARS-CoV-2 Spike monomers of native, Delta and Omicron variants and identified correlations with a wide range of viral proteins from close to distant taxonomy ranks, as well as host proteins, such as ACE2 receptor. Machaon's meta-analysis of the results highlights structural, chemical and transcriptional similarities between the Spike monomer and human proteins, indicating a multi-level viral mimicry. This extended analysis also revealed relationships of the Spike protein with biological processes such as ubiquitination and angiogenesis and highlighted different patterns in virus attachment among the studied variants. Available at: https://machaonweb.com .
Collapse
Affiliation(s)
- Panos Kakoulidis
- Department of Informatics and Telecommunications, National and Kapodistrian University of Athens, Ilisia, 157 84, Athens, Greece
- Biomedical Research Foundation of the Academy of Athens, 4 Soranou Ephessiou St., 115 27, Athens, Greece
| | - Ioannis S Vlachos
- Broad Institute of MIT and Harvard, Merkin Building, 415 Main St., Cambridge, MA, 02142, USA
- Cancer Research Institute, Beth Israel Deaconess Medical Center, 330 Brookline Avenue, Boston, MA, 02215, USA
- Department of Pathology, Beth Israel Deaconess Medical Center, 330 Brookline Avenue, Boston, MA, 02215, USA
- Harvard Medical School, 25 Shattuck Street, Boston, MA, 02115, USA
- Spatial Technologies Unit, Harvard Medical School Initiative for RNA Medicine, Dana Building, Beth Israel Deaconess Medical Center, 330 Brookline Avenue, Boston, MA, 02215, USA
| | - Dimitris Thanos
- Biomedical Research Foundation of the Academy of Athens, 4 Soranou Ephessiou St., 115 27, Athens, Greece
| | - Gregory L Blatch
- Biomedical Biotechnology Research Unit, Department of Biochemistry and Microbiology, Rhodes University, PO Box 94, Makhanda (Grahamstown) 6140, Eastern Cape, South Africa
- Biomedical and Drug Discovery Research Group, Faculty of Health Sciences, Higher Colleges of Technology, PO 25026, Sharjah, UAE
- Institute for Health and Sport, Victoria University, Melbourne, PO Box 14428, VIC 8001, Melbourne, Australia
- The Vice Chancellery, The University of Notre Dame Australia, PO Box 1225, WA 6959, Fremantle, Australia
| | - Ioannis Z Emiris
- Department of Informatics and Telecommunications, National and Kapodistrian University of Athens, Ilisia, 157 84, Athens, Greece
- ATHENA Research and Innovation Center, Artemidos 6 & Epidavrou 15125, Marousi, Greece
| | - Ema Anastasiadou
- Biomedical Research Foundation of the Academy of Athens, 4 Soranou Ephessiou St., 115 27, Athens, Greece.
| |
Collapse
|
42
|
Pan A, Zeng Y, Liu J, Zhou M, Lai EC, Yu Y. Unanticipated broad phylogeny of BEN DNA-binding domains revealed by structural homology searches. Curr Biol 2023; 33:2270-2282.e2. [PMID: 37236184 PMCID: PMC10348805 DOI: 10.1016/j.cub.2023.05.011] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 04/07/2023] [Accepted: 05/05/2023] [Indexed: 05/28/2023]
Abstract
Organization of protein sequences into domain families is a foundation for cataloging and investigating protein functions. However, long-standing strategies based on primary amino acid sequences are blind to the possibility that proteins with dissimilar sequences could have comparable tertiary structures. Building on our recent findings that in silico structural predictions of BEN family DNA-binding domains closely resemble their experimentally determined crystal structures, we exploited the AlphaFold2 database for comprehensive identification of BEN domains. Indeed, we identified numerous novel BEN domains, including members of new subfamilies. For example, while no BEN domain factors had previously been annotated in C. elegans, this species actually encodes multiple BEN proteins. These include key developmental timing genes of orphan domain status, sel-7 and lin-14, the latter being the central target of the founding miRNA lin-4. We also reveal that the domain of unknown function 4806 (DUF4806), which is widely distributed across metazoans, is structurally similar to BEN and comprises a new subtype. Surprisingly, we find that BEN domains resemble both metazoan and non-metazoan homeodomains in 3D conformation and preserve characteristic residues, indicating that despite their inability to be aligned by conventional methods, these DNA-binding modules are probably evolutionarily related. Finally, we broaden the application of structural homology searches by revealing novel human members of DUF3504, which exists on diverse proteins with presumed or known nuclear functions. Overall, our work strongly expands this recently identified family of transcription factors and illustrates the value of 3D structural predictions to annotate protein domains and interpret their functions.
Collapse
Affiliation(s)
- Anyu Pan
- State Key Laboratory of Medical Molecular Biology, Department of Molecular Biology and Biochemistry, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & School of Basic Medicine, Peking Union Medical College, Beijing 100005, China
| | - Yangfan Zeng
- State Key Laboratory of Medical Molecular Biology, Department of Molecular Biology and Biochemistry, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & School of Basic Medicine, Peking Union Medical College, Beijing 100005, China
| | - Jingjing Liu
- State Key Laboratory of Medical Molecular Biology, Department of Molecular Biology and Biochemistry, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & School of Basic Medicine, Peking Union Medical College, Beijing 100005, China
| | - Mengjie Zhou
- State Key Laboratory of Medical Molecular Biology, Department of Molecular Biology and Biochemistry, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & School of Basic Medicine, Peking Union Medical College, Beijing 100005, China
| | - Eric C Lai
- Developmental Biology Program, Sloan Kettering Institute, New York, NY 10065, USA
| | - Yang Yu
- State Key Laboratory of Medical Molecular Biology, Department of Molecular Biology and Biochemistry, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & School of Basic Medicine, Peking Union Medical College, Beijing 100005, China.
| |
Collapse
|
43
|
Ruperti F, Papadopoulos N, Musser JM, Mirdita M, Steinegger M, Arendt D. Cross-phyla protein annotation by structural prediction and alignment. Genome Biol 2023; 24:113. [PMID: 37173746 PMCID: PMC10176882 DOI: 10.1186/s13059-023-02942-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Accepted: 04/18/2023] [Indexed: 05/15/2023] Open
Abstract
BACKGROUND Protein annotation is a major goal in molecular biology, yet experimentally determined knowledge is typically limited to a few model organisms. In non-model species, the sequence-based prediction of gene orthology can be used to infer protein identity; however, this approach loses predictive power at longer evolutionary distances. Here we propose a workflow for protein annotation using structural similarity, exploiting the fact that similar protein structures often reflect homology and are more conserved than protein sequences. RESULTS We propose a workflow of openly available tools for the functional annotation of proteins via structural similarity (MorF: MorphologFinder) and use it to annotate the complete proteome of a sponge. Sponges are highly relevant for inferring the early history of animals, yet their proteomes remain sparsely annotated. MorF accurately predicts the functions of proteins with known homology in [Formula: see text] cases and annotates an additional [Formula: see text] of the proteome beyond standard sequence-based methods. We uncover new functions for sponge cell types, including extensive FGF, TGF, and Ephrin signaling in sponge epithelia, and redox metabolism and control in myopeptidocytes. Notably, we also annotate genes specific to the enigmatic sponge mesocytes, proposing they function to digest cell walls. CONCLUSIONS Our work demonstrates that structural similarity is a powerful approach that complements and extends sequence similarity searches to identify homologous proteins over long evolutionary distances. We anticipate this will be a powerful approach that boosts discovery in numerous -omics datasets, especially for non-model organisms.
Collapse
Affiliation(s)
- Fabian Ruperti
- Developmental Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
- Faculty of Biosciences, Collaboration for joint Ph.D. degree between EMBL and Heidelberg University, Heidelberg, Germany
| | - Nikolaos Papadopoulos
- Developmental Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
- Department for Evolutionary Biology, University of Vienna, Vienna, Austria
| | - Jacob M Musser
- Developmental Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Milot Mirdita
- School of Biological Sciences, Seoul National University, Seoul, South Korea
| | - Martin Steinegger
- School of Biological Sciences, Seoul National University, Seoul, South Korea
- Artificial Intelligence Institute, Seoul National University, Seoul, South Korea
| | - Detlev Arendt
- Developmental Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany.
- Centre for Organismal Studies, University of Heidelberg, Heidelberg, Germany.
| |
Collapse
|
44
|
Maiti S, Nazmeen A, Banerjee A. Significant impact of redox regulation of estrogen-metabolizing proteins on cellular stress responses. Cell Biochem Funct 2023. [PMID: 37139830 DOI: 10.1002/cbf.3796] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2023] [Revised: 04/07/2023] [Accepted: 04/17/2023] [Indexed: 05/05/2023]
Abstract
The ultimate driving force, stress, promotes adaptability/evolution in proliferating organisms, transforming tumorigenic growth. Estradiol (E2) regulates both phenomena. In this study, bioinformatics-tools, site-directed-mutagenesis (human estrogen-sulfotransferase/hSULT1E1), HepG2 cells tested with N-acetyl-cysteine (NAC/thiol-inducer) or buthionine-sulfoxamine (BSO/thiol-depletory) were evaluated for hSULT1E1 (estradiol-sulphating/inactivating) functions. Reciprocal redox regulation of steroid sulfatase (STS, E2-desulfating/activating) results in the Cys-formylglycine transition by the formylglycine-forming enzyme (FGE). The enzyme sequences and structures were examined across the phylogeny. Motif/domain and the catalytic conserve sequences and protein-surface-topography (CASTp) were investigated. The E2 binding to SULT1E1 suggests that the conserved-catalytic-domain in this enzyme has critical Cysteine 83 at position. This is strongly supported by site-directed mutagenesis/HepG2-cell research. Molecular-docking and superimposition studies of E2 with the SULT1E1 of representative species and to STS reinforce this hypothesis. SULT1E1-STS are reciprocally activated in response to the cellular-redox-environment by the critical Cys of these two enzymes. The importance of E2 in organism/species proliferation and tissue tumorigenesis is highlighted.
Collapse
Affiliation(s)
- Smarajit Maiti
- Department of Biochemistry, Cell & Molecular Therapeutics Lab, Oriental Institute of Science & Technology, Midnapore, India
| | - Aarifa Nazmeen
- Department of Biochemistry, Cell & Molecular Therapeutics Lab, Oriental Institute of Science & Technology, Midnapore, India
| | - Amrita Banerjee
- Department of Biochemistry, Cell & Molecular Therapeutics Lab, Oriental Institute of Science & Technology, Midnapore, India
| |
Collapse
|
45
|
Aubel M, Eicholt L, Bornberg-Bauer E. Assessing structure and disorder prediction tools for de novo emerged proteins in the age of machine learning. F1000Res 2023; 12:347. [PMID: 37113259 PMCID: PMC10126731 DOI: 10.12688/f1000research.130443.1] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/17/2023] [Indexed: 03/31/2023] Open
Abstract
Background: De novo protein coding genes emerge from scratch in the non-coding regions of the genome and have, per definition, no homology to other genes. Therefore, their encoded de novo proteins belong to the so-called "dark protein space". So far, only four de novo protein structures have been experimentally approximated. Low homology, presumed high disorder and limited structures result in low confidence structural predictions for de novo proteins in most cases. Here, we look at the most widely used structure and disorder predictors and assess their applicability for de novo emerged proteins. Since AlphaFold2 is based on the generation of multiple sequence alignments and was trained on solved structures of largely conserved and globular proteins, its performance on de novo proteins remains unknown. More recently, natural language models of proteins have been used for alignment-free structure predictions, potentially making them more suitable for de novo proteins than AlphaFold2. Methods: We applied different disorder predictors (IUPred3 short/long, flDPnn) and structure predictors, AlphaFold2 on the one hand and language-based models (Omegafold, ESMfold, RGN2) on the other hand, to four de novo proteins with experimental evidence on structure. We compared the resulting predictions between the different predictors as well as to the existing experimental evidence. Results: Results from IUPred, the most widely used disorder predictor, depend heavily on the choice of parameters and differ significantly from flDPnn which has been found to outperform most other predictors in a comparative assessment study recently. Similarly, different structure predictors yielded varying results and confidence scores for de novo proteins. Conclusions: We suggest that, while in some cases protein language model based approaches might be more accurate than AlphaFold2, the structure prediction of de novo emerged proteins remains a difficult task for any predictor, be it disorder or structure.
Collapse
Affiliation(s)
- Margaux Aubel
- Institute for Evolution and Bidiversity, University of Muenster, Muenster, 48149, Germany
| | - Lars Eicholt
- Institute for Evolution and Bidiversity, University of Muenster, Muenster, 48149, Germany
| | - Erich Bornberg-Bauer
- Institute for Evolution and Bidiversity, University of Muenster, Muenster, 48149, Germany
- Department Protein Evolution, Max Planck-Institute for Biology, Tuebingen, 72076, Germany
| |
Collapse
|
46
|
Benton R, Himmel NJ. Structural screens identify candidate human homologs of insect chemoreceptors and cryptic Drosophila gustatory receptor-like proteins. eLife 2023; 12:85537. [PMID: 36803935 PMCID: PMC9998090 DOI: 10.7554/elife.85537] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Accepted: 02/16/2023] [Indexed: 02/22/2023] Open
Abstract
Insect odorant receptors and gustatory receptors define a superfamily of seven transmembrane domain ion channels (referred to here as 7TMICs), with homologs identified across Animalia except Chordata. Previously, we used sequence-based screening methods to reveal conservation of this family in unicellular eukaryotes and plants (DUF3537 proteins) (Benton et al., 2020). Here, we combine three-dimensional structure-based screening, ab initio protein folding predictions, phylogenetics, and expression analyses to characterize additional candidate homologs with tertiary but little or no primary structural similarity to known 7TMICs, including proteins in disease-causing Trypanosoma. Unexpectedly, we identify structural similarity between 7TMICs and PHTF proteins, a deeply conserved family of unknown function, whose human orthologs display enriched expression in testis, cerebellum, and muscle. We also discover divergent groups of 7TMICs in insects, which we term the gustatory receptor-like (Grl) proteins. Several Drosophila melanogaster Grls display selective expression in subsets of taste neurons, suggesting that they are previously unrecognized insect chemoreceptors. Although we cannot exclude the possibility of remarkable structural convergence, our findings support the origin of 7TMICs in a eukaryotic common ancestor, counter previous assumptions of complete loss of 7TMICs in Chordata, and highlight the extreme evolvability of this protein fold, which likely underlies its functional diversification in different cellular contexts.
Collapse
Affiliation(s)
- Richard Benton
- Center for Integrative Genomics, Faculty of Biology and Medicine, University of LausanneLausanneSwitzerland
| | - Nathaniel J Himmel
- Center for Integrative Genomics, Faculty of Biology and Medicine, University of LausanneLausanneSwitzerland
| |
Collapse
|
47
|
Smoniewski CM, Borujeni PM, Petersen A, Hampton M, Salavati R, Zimmer SL. Circular mitochondrial-encoded mRNAs are a distinct subpopulation of mitochondrial mRNA in Trypanosoma brucei. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.10.528059. [PMID: 36798374 DOI: 10.1101/2023.01.18.524644] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Since the first identification of circular RNA (circRNA) in viral-like systems, reports of circRNAs and their functions in various organisms, cell types, and organelles have greatly expanded. Here, we report the first evidence of circular mRNA in the mitochondrion of the eukaryotic parasite, Trypanosoma brucei . While using a circular RT-PCR technique developed to sequence mRNA tails of mitochondrial transcripts, we found that some mRNAs are circularized without an in vitro circularization step normally required to produce PCR products. Starting from total in vitro circularized RNA and in vivo circRNA, we high-throughput sequenced three transcripts from the 3' end of the coding region, through the 3' tail, to the 5' start of the coding region. We found that fewer reads in the circRNA libraries contained tails than in the total RNA libraries. When tails were present on circRNAs, they were shorter and less adenine-rich than the total population of RNA tails of the same transcript. Additionally, using hidden Markov modelling we determined that enzymatic activity during tail addition is different for circRNAs than for total RNA. Lastly, circRNA UTRs tended to be shorter and more variable than those of the same transcript sequenced from total RNA. We propose a revised model of Trypanosome mitochondrial tail addition, in which a fraction of mRNAs is circularized prior to the addition of adenine-rich tails and may act as a new regulatory molecule or in a degradation pathway.
Collapse
Affiliation(s)
- Clara M Smoniewski
- Department of Biomedical Sciences, University of Minnesota Medical School Duluth Campus, Duluth, MN, USA
| | | | - Austin Petersen
- Department of Biology, University of Minnesota Duluth, Duluth, MN, USA
| | - Marshall Hampton
- Department of Mathematics and Statistics, University of Minnesota Duluth, Duluth, MN, USA
| | - Reza Salavati
- Institute of Parasitology, McGill University, Quebec, Canada
| | - Sara L Zimmer
- Department of Biomedical Sciences, University of Minnesota Medical School Duluth Campus, Duluth, MN, USA
| |
Collapse
|
48
|
Arguelles J, Lee J, Cardenas LV, Govind S, Singh S. In Silico Analysis of a Drosophila Parasitoid Venom Peptide Reveals Prevalence of the Cation-Polar-Cation Clip Motif in Knottin Proteins. Pathogens 2023; 12:pathogens12010143. [PMID: 36678491 PMCID: PMC9865768 DOI: 10.3390/pathogens12010143] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 01/10/2023] [Accepted: 01/11/2023] [Indexed: 01/18/2023] Open
Abstract
As generalist parasitoid wasps, Leptopilina heterotoma are highly successful on many species of fruit flies of the genus Drosophila. The parasitoids produce specialized multi-strategy extracellular vesicle (EV)-like structures in their venom. Proteomic analysis identified several immunity-associated proteins, including the knottin peptide, LhKNOT, containing the structurally conserved inhibitor cysteine knot (ICK) fold, which is present in proteins from diverse taxa. Our structural and docking analysis of LhKNOT's 36-residue core knottin fold revealed that in addition to the knottin motif itself, it also possesses a Cation-Polar-Cation (CPC) clip. The CPC clip motif is thought to facilitate antimicrobial activity in heparin-binding proteins. Surprisingly, a majority of ICKs tested also possess the CPC clip motif, including 75 bona fide plant and arthropod knottin proteins that share high sequence and/or structural similarity with LhKNOT. Like LhKNOT and these other 75 knottin proteins, even the Drosophila Drosomycin antifungal peptide, a canonical target gene of the fly's Toll-NF-kappa B immune pathway, contains this CPC clip motif. Together, our results suggest a possible defensive function for the parasitoid LhKNOT. The prevalence of the CPC clip motif, intrinsic to the cysteine knot within the knottin proteins examined here, suggests that the resultant 3D topology is important for their biochemical functions. The CPC clip is likely a highly conserved structural motif found in many diverse proteins with reported heparin binding capacity, including amyloid proteins. Knottins are targets for therapeutic drug development, and insights into their structure-function relationships will advance novel drug design.
Collapse
Affiliation(s)
- Joseph Arguelles
- Department of Biology, Brooklyn College, Brooklyn, NY 11210, USA
| | - Jenny Lee
- Department of Biology, Brooklyn College, Brooklyn, NY 11210, USA
| | - Lady V. Cardenas
- Department of Biology, The City College of New York, New York, NY 10031, USA
| | - Shubha Govind
- Department of Biology, The City College of New York, New York, NY 10031, USA
- PhD Program in Biochemistry, The Graduate Center of the City University of New York, New York, NY 10016, USA
- PhD Program in Biology, The Graduate Center of the City University of New York, New York, NY 10016, USA
| | - Shaneen Singh
- Department of Biology, Brooklyn College, Brooklyn, NY 11210, USA
- PhD Program in Biochemistry, The Graduate Center of the City University of New York, New York, NY 10016, USA
- PhD Program in Biology, The Graduate Center of the City University of New York, New York, NY 10016, USA
- Correspondence:
| |
Collapse
|
49
|
Zea DJ, Teppa E, Marino-Buslje C. Easy Not Easy: Comparative Modeling with High-Sequence Identity Templates. Methods Mol Biol 2023; 2627:83-100. [PMID: 36959443 DOI: 10.1007/978-1-0716-2974-1_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2023]
Abstract
Homology modeling is the most common technique to build structural models of a target protein based on the structure of proteins with high-sequence identity and available high-resolution structures. This technique is based on the idea that protein structure shows fewer changes than sequence through evolution. While in this scenario single mutations would minimally perturb the structure, experimental evidence shows otherwise: proteins with high conformational diversity impose a limit of the paradigm of comparative modeling as the same protein sequence can adopt dissimilar three-dimensional structures. These cases present challenges for modeling; at first glance, they may seem to be easy cases, but they have a complexity that is not evident at the sequence level. In this chapter, we address the following questions: Why should we care about conformational diversity? How to consider conformational diversity when doing template-based modeling in a practical way?
Collapse
Affiliation(s)
- Diego Javier Zea
- Laboratory of Computational and Quantitative Biology, LCQB, UMR 7238 CNRS, IBPS, Sorbonne Université, Paris, France
| | - Elin Teppa
- Toulouse Biotechnology Institute, TBI, Université de Toulouse, CNRS, INRA, INSA, Toulouse, France
| | | |
Collapse
|
50
|
Kawasaki J, Tomonaga K, Horie M. Large-scale investigation of zoonotic viruses in the era of high-throughput sequencing. Microbiol Immunol 2023; 67:1-13. [PMID: 36259224 DOI: 10.1111/1348-0421.13033] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2022] [Revised: 09/28/2022] [Accepted: 10/16/2022] [Indexed: 01/10/2023]
Abstract
Zoonotic diseases considerably impact public health and socioeconomics. RNA viruses reportedly caused approximately 94% of zoonotic diseases documented from 1990 to 2010, emphasizing the importance of investigating RNA viruses in animals. Furthermore, it has been estimated that hundreds of thousands of animal viruses capable of infecting humans are yet to be discovered, warning against the inadequacy of our understanding of viral diversity. High-throughput sequencing (HTS) has enabled the identification of viral infections with relatively little bias. Viral searches using both symptomatic and asymptomatic animal samples by HTS have revealed hidden viral infections. This review introduces the history of viral searches using HTS, current analytical limitations, and future potentials. We primarily summarize recent research on large-scale investigations on viral infections reusing HTS data from public databases. Furthermore, considering the accumulation of uncultivated viruses, we discuss current studies and challenges for connecting viral sequences to their phenotypes using various approaches: performing data analysis, developing predictive modeling, or implementing high-throughput platforms of virological experiments. We believe that this article provides a future direction in large-scale investigations of potential zoonotic viruses using the HTS technology.
Collapse
Affiliation(s)
- Junna Kawasaki
- Laboratory of RNA Viruses, Department of Virus Research, Institute for Frontier Life and Medical Sciences, Kyoto University, Kyoto, Japan.,Laboratory of RNA Viruses, Department of Mammalian Regulatory Network, Graduate School of Biostudies, Kyoto University, Kyoto, Japan.,Faculty of Science and Engineering, Waseda University, Tokyo, Japan
| | - Keizo Tomonaga
- Laboratory of RNA Viruses, Department of Virus Research, Institute for Frontier Life and Medical Sciences, Kyoto University, Kyoto, Japan.,Laboratory of RNA Viruses, Department of Mammalian Regulatory Network, Graduate School of Biostudies, Kyoto University, Kyoto, Japan.,Department of Molecular Virology, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Masayuki Horie
- Division of Veterinary Sciences, Graduate School of Life and Environmental Sciences, Osaka Prefecture University, Osaka, Japan.,Osaka International Research Center for Infectious Diseases, Osaka Prefecture University, Osaka, Japan
| |
Collapse
|