1
|
Waman VP, Bordin N, Alcraft R, Vickerstaff R, Rauer C, Chan Q, Sillitoe I, Yamamori H, Orengo C. CATH 2024: CATH-AlphaFlow Doubles the Number of Structures in CATH and Reveals Nearly 200 New Folds. J Mol Biol 2024; 436:168551. [PMID: 38548261 DOI: 10.1016/j.jmb.2024.168551] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 03/20/2024] [Accepted: 03/22/2024] [Indexed: 04/07/2024]
Abstract
CATH (https://www.cathdb.info) classifies domain structures from experimental protein structures in the PDB and predicted structures in the AlphaFold Database (AFDB). To cope with the scale of the predicted data a new NextFlow workflow (CATH-AlphaFlow), has been developed to classify high-quality domains into CATH superfamilies and identify novel fold groups and superfamilies. CATH-AlphaFlow uses a novel state-of-the-art structure-based domain boundary prediction method (ChainSaw) for identifying domains in multi-domain proteins. We applied CATH-AlphaFlow to process PDB structures not classified in CATH and AFDB structures from 21 model organisms, expanding CATH by over 100%. Domains not classified in existing CATH superfamilies or fold groups were used to seed novel folds, giving 253 new folds from PDB structures (September 2023 release) and 96 from AFDB structures of proteomes of 21 model organisms. Where possible, functional annotations were obtained using (i) predictions from publicly available methods (ii) annotations from structural relatives in AFDB/UniProt50. We also predicted functional sites and highly conserved residues. Some folds are associated with important functions such as photosynthetic acclimation (in flowering plants), iron permease activity (in fungi) and post-natal spermatogenesis (in mice). CATH-AlphaFlow will allow us to identify many more CATH relatives in the AFDB, further characterising the protein structure landscape.
Collapse
Affiliation(s)
- Vaishali P Waman
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom
| | - Nicola Bordin
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom
| | - Rachel Alcraft
- Advanced Research Computing Centre, University College London, London, United Kingdom
| | - Robert Vickerstaff
- Advanced Research Computing Centre, University College London, London, United Kingdom
| | - Clemens Rauer
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom
| | - Qian Chan
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom
| | - Hazuki Yamamori
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom.
| |
Collapse
|
2
|
Waman VP, Ashford P, Lam SD, Sen N, Abbasian M, Woodridge L, Goldtzvik Y, Bordin N, Wu J, Sillitoe I, Orengo CA. Predicting human and viral protein variants affecting COVID-19 susceptibility and repurposing therapeutics. Sci Rep 2024; 14:14208. [PMID: 38902252 PMCID: PMC11190248 DOI: 10.1038/s41598-024-61541-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Accepted: 05/07/2024] [Indexed: 06/22/2024] Open
Abstract
The COVID-19 disease is an ongoing global health concern. Although vaccination provides some protection, people are still susceptible to re-infection. Ostensibly, certain populations or clinical groups may be more vulnerable. Factors causing these differences are unclear and whilst socioeconomic and cultural differences are likely to be important, human genetic factors could influence susceptibility. Experimental studies indicate SARS-CoV-2 uses innate immune suppression as a strategy to speed-up entry and replication into the host cell. Therefore, it is necessary to understand the impact of variants in immunity-associated human proteins on susceptibility to COVID-19. In this work, we analysed missense coding variants in several SARS-CoV-2 proteins and their human protein interactors that could enhance binding affinity to SARS-CoV-2. We curated a dataset of 19 SARS-CoV-2: human protein 3D-complexes, from the experimentally determined structures in the Protein Data Bank and models built using AlphaFold2-multimer, and analysed the impact of missense variants occurring in the protein-protein interface region. We analysed 468 missense variants from human proteins and 212 variants from SARS-CoV-2 proteins and computationally predicted their impacts on binding affinities for the human viral protein complexes. We predicted a total of 26 affinity-enhancing variants from 13 human proteins implicated in increased binding affinity to SARS-CoV-2. These include key-immunity associated genes (TOMM70, ISG15, IFIH1, IFIT2, RPS3, PALS1, NUP98, AXL, ARF6, TRIMM, TRIM25) as well as important spike receptors (KREMEN1, AXL and ACE2). We report both common (e.g., Y13N in IFIH1) and rare variants in these proteins and discuss their likely structural and functional impact, using information on known and predicted functional sites. Potential mechanisms associated with immune suppression implicated by these variants are discussed. Occurrence of certain predicted affinity-enhancing variants should be monitored as they could lead to increased susceptibility and reduced immune response to SARS-CoV-2 infection in individuals/populations carrying them. Our analyses aid in understanding the potential impact of genetic variation in immunity-associated proteins on COVID-19 susceptibility and help guide drug-repurposing strategies.
Collapse
Affiliation(s)
- Vaishali P Waman
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Paul Ashford
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Su Datt Lam
- Department of Applied Physics, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, Bangi, Malaysia
| | - Neeladri Sen
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Mahnaz Abbasian
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Laurel Woodridge
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Yonathan Goldtzvik
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Nicola Bordin
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Jiaxin Wu
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Christine A Orengo
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK.
| |
Collapse
|
3
|
Andradi-Brown C, Wichers-Misterek JS, von Thien H, Höppner YD, Scholz JAM, Hansson H, Filtenborg Hocke E, Gilberger TW, Duffy MF, Lavstsen T, Baum J, Otto TD, Cunnington AJ, Bachmann A. A novel computational pipeline for var gene expression augments the discovery of changes in the Plasmodium falciparum transcriptome during transition from in vivo to short-term in vitro culture. eLife 2024; 12:RP87726. [PMID: 38270586 PMCID: PMC10945709 DOI: 10.7554/elife.87726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2024] Open
Abstract
The pathogenesis of severe Plasmodium falciparum malaria involves cytoadhesive microvascular sequestration of infected erythrocytes, mediated by P. falciparum erythrocyte membrane protein 1 (PfEMP1). PfEMP1 variants are encoded by the highly polymorphic family of var genes, the sequences of which are largely unknown in clinical samples. Previously, we published new approaches for var gene profiling and classification of predicted binding phenotypes in clinical P. falciparum isolates (Wichers et al., 2021), which represented a major technical advance. Building on this, we report here a novel method for var gene assembly and multidimensional quantification from RNA-sequencing that outperforms the earlier approach of Wichers et al., 2021, on both laboratory and clinical isolates across a combination of metrics. Importantly, the tool can interrogate the var transcriptome in context with the rest of the transcriptome and can be applied to enhance our understanding of the role of var genes in malaria pathogenesis. We applied this new method to investigate changes in var gene expression through early transition of parasite isolates to in vitro culture, using paired sets of ex vivo samples from our previous study, cultured for up to three generations. In parallel, changes in non-polymorphic core gene expression were investigated. Modest but unpredictable var gene switching and convergence towards var2csa were observed in culture, along with differential expression of 19% of the core transcriptome between paired ex vivo and generation 1 samples. Our results cast doubt on the validity of the common practice of using short-term cultured parasites to make inferences about in vivo phenotype and behaviour.
Collapse
Affiliation(s)
- Clare Andradi-Brown
- Section of Paediatric Infectious Disease, Department of Infectious Disease, Imperial College LondonLondonUnited Kingdom
- Department of Life Sciences, Imperial College London, South KensingtonLondonUnited Kingdom
- Centre for Paediatrics and Child Health, Imperial College LondonLondonUnited Kingdom
| | - Jan Stephan Wichers-Misterek
- Bernhard Nocht Institute for Tropical Medicine, Bernhard-Nocht-StrasseHamburgGermany
- Centre for Structural Systems BiologyHamburgGermany
- Biology Department, University of HamburgHamburgGermany
| | - Heidrun von Thien
- Bernhard Nocht Institute for Tropical Medicine, Bernhard-Nocht-StrasseHamburgGermany
- Centre for Structural Systems BiologyHamburgGermany
- Biology Department, University of HamburgHamburgGermany
| | - Yannick D Höppner
- Bernhard Nocht Institute for Tropical Medicine, Bernhard-Nocht-StrasseHamburgGermany
- Centre for Structural Systems BiologyHamburgGermany
- Biology Department, University of HamburgHamburgGermany
| | - Judith AM Scholz
- Bernhard Nocht Institute for Tropical Medicine, Bernhard-Nocht-StrasseHamburgGermany
| | - Helle Hansson
- Center for Medical Parasitology, Department of Immunology and Microbiology, University of CopenhagenCopenhagenDenmark
- Department of Infectious Diseases, Copenhagen University HospitalCopenhagenDenmark
| | - Emma Filtenborg Hocke
- Center for Medical Parasitology, Department of Immunology and Microbiology, University of CopenhagenCopenhagenDenmark
- Department of Infectious Diseases, Copenhagen University HospitalCopenhagenDenmark
| | - Tim Wolf Gilberger
- Bernhard Nocht Institute for Tropical Medicine, Bernhard-Nocht-StrasseHamburgGermany
- Centre for Structural Systems BiologyHamburgGermany
- Biology Department, University of HamburgHamburgGermany
| | - Michael F Duffy
- Department of Microbiology and Immunology, University of MelbourneMelbourneAustralia
| | - Thomas Lavstsen
- Center for Medical Parasitology, Department of Immunology and Microbiology, University of CopenhagenCopenhagenDenmark
- Department of Infectious Diseases, Copenhagen University HospitalCopenhagenDenmark
| | - Jake Baum
- Department of Life Sciences, Imperial College London, South KensingtonLondonUnited Kingdom
- School of Biomedical Sciences, Faculty of Medicine & Health, UNSW, KensingtonSydneyUnited Kingdom
| | - Thomas D Otto
- School of Infection & Immunity, MVLS, University of GlasgowGlasgowUnited Kingdom
| | - Aubrey J Cunnington
- Section of Paediatric Infectious Disease, Department of Infectious Disease, Imperial College LondonLondonUnited Kingdom
- Centre for Paediatrics and Child Health, Imperial College LondonLondonUnited Kingdom
| | - Anna Bachmann
- Bernhard Nocht Institute for Tropical Medicine, Bernhard-Nocht-StrasseHamburgGermany
- Centre for Structural Systems BiologyHamburgGermany
- Biology Department, University of HamburgHamburgGermany
- German Center for Infection Research (DZIF), partner site Hamburg-Borstel-Lübeck-RiemsHamburgGermany
| |
Collapse
|
4
|
Lou YC, Rubin BE, Schoelmerich MC, DiMarco KS, Borges AL, Rovinsky R, Song L, Doudna JA, Banfield JF. Infant microbiome cultivation and metagenomic analysis reveal Bifidobacterium 2'-fucosyllactose utilization can be facilitated by coexisting species. Nat Commun 2023; 14:7417. [PMID: 37973815 PMCID: PMC10654741 DOI: 10.1038/s41467-023-43279-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Accepted: 11/06/2023] [Indexed: 11/19/2023] Open
Abstract
The early-life gut microbiome development has long-term health impacts and can be influenced by factors such as infant diet. Human milk oligosaccharides (HMOs), an essential component of breast milk that can only be metabolized by some beneficial gut microorganisms, ensure proper gut microbiome establishment and infant development. However, how HMOs are metabolized by gut microbiomes is not fully elucidated. Isolate studies have revealed the genetic basis for HMO metabolism, but they exclude the possibility of HMO assimilation via synergistic interactions involving multiple organisms. Here, we investigate microbiome responses to 2'-fucosyllactose (2'FL), a prevalent HMO and a common infant formula additive, by establishing individualized microbiomes using fecal samples from three infants as the inocula. Bifidobacterium breve, a prominent member of infant microbiomes, typically cannot metabolize 2'FL. Using metagenomic data, we predict that extracellular fucosidases encoded by co-existing members such as Ruminococcus gnavus initiate 2'FL breakdown, thus critical for B. breve's growth. Using both targeted co-cultures and by supplementation of R. gnavus into one microbiome, we show that R. gnavus can promote extensive growth of B. breve through the release of lactose from 2'FL. Overall, microbiome cultivation combined with genome-resolved metagenomics demonstrates that HMO utilization can vary with an individual's microbiome.
Collapse
Affiliation(s)
- Yue Clare Lou
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
- Innovative Genomics Institute, University of California, Berkeley, CA, USA
| | - Benjamin E Rubin
- Innovative Genomics Institute, University of California, Berkeley, CA, USA
| | - Marie C Schoelmerich
- Innovative Genomics Institute, University of California, Berkeley, CA, USA
- Department of Environmental Systems Sciences, ETH Zurich, Zurich, Switzerland
| | - Kaden S DiMarco
- Innovative Genomics Institute, University of California, Berkeley, CA, USA
| | - Adair L Borges
- Innovative Genomics Institute, University of California, Berkeley, CA, USA
| | - Rachel Rovinsky
- Innovative Genomics Institute, University of California, Berkeley, CA, USA
- Department of Bacteriology, University of Wisconsin-Madison, Madison, WI, USA
| | - Leo Song
- Innovative Genomics Institute, University of California, Berkeley, CA, USA
| | - Jennifer A Doudna
- Innovative Genomics Institute, University of California, Berkeley, CA, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, USA
- Department of Chemistry, University of California, Berkeley, CA, USA
- Howard Hughes Medical Institute, University of California, Berkeley, CA, USA
| | - Jillian F Banfield
- Innovative Genomics Institute, University of California, Berkeley, CA, USA.
- Department of Earth and Planetary Science, University of California, Berkeley, CA, USA.
- Department of Environmental Science, Policy, and Management, University of California, Berkeley, CA, USA.
| |
Collapse
|
5
|
Rodríguez-López M, Bordin N, Lees J, Scholes H, Hassan S, Saintain Q, Kamrad S, Orengo C, Bähler J. Broad functional profiling of fission yeast proteins using phenomics and machine learning. eLife 2023; 12:RP88229. [PMID: 37787768 PMCID: PMC10547477 DOI: 10.7554/elife.88229] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/04/2023] Open
Abstract
Many proteins remain poorly characterized even in well-studied organisms, presenting a bottleneck for research. We applied phenomics and machine-learning approaches with Schizosaccharomyces pombe for broad cues on protein functions. We assayed colony-growth phenotypes to measure the fitness of deletion mutants for 3509 non-essential genes in 131 conditions with different nutrients, drugs, and stresses. These analyses exposed phenotypes for 3492 mutants, including 124 mutants of 'priority unstudied' proteins conserved in humans, providing varied functional clues. For example, over 900 proteins were newly implicated in the resistance to oxidative stress. Phenotype-correlation networks suggested roles for poorly characterized proteins through 'guilt by association' with known proteins. For complementary functional insights, we predicted Gene Ontology (GO) terms using machine learning methods exploiting protein-network and protein-homology data (NET-FF). We obtained 56,594 high-scoring GO predictions, of which 22,060 also featured high information content. Our phenotype-correlation data and NET-FF predictions showed a strong concordance with existing PomBase GO annotations and protein networks, with integrated analyses revealing 1675 novel GO predictions for 783 genes, including 47 predictions for 23 priority unstudied proteins. Experimental validation identified new proteins involved in cellular aging, showing that these predictions and phenomics data provide a rich resource to uncover new protein functions.
Collapse
Affiliation(s)
- María Rodríguez-López
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
| | - Nicola Bordin
- University College London, Institute of Structural and Molecular BiologyLondonUnited Kingdom
| | - Jon Lees
- University College London, Institute of Structural and Molecular BiologyLondonUnited Kingdom
- University of BristolBristolUnited Kingdom
| | - Harry Scholes
- University College London, Institute of Structural and Molecular BiologyLondonUnited Kingdom
| | - Shaimaa Hassan
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
- Helwan University, Faculty of PharmacyCairoEgypt
| | - Quentin Saintain
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
| | - Stephan Kamrad
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
| | - Christine Orengo
- University College London, Institute of Structural and Molecular BiologyLondonUnited Kingdom
| | - Jürg Bähler
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
| |
Collapse
|
6
|
Carter MM, Olm MR, Merrill BD, Dahan D, Tripathi S, Spencer SP, Yu FB, Jain S, Neff N, Jha AR, Sonnenburg ED, Sonnenburg JL. Ultra-deep sequencing of Hadza hunter-gatherers recovers vanishing gut microbes. Cell 2023; 186:3111-3124.e13. [PMID: 37348505 PMCID: PMC10330870 DOI: 10.1016/j.cell.2023.05.046] [Citation(s) in RCA: 44] [Impact Index Per Article: 44.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Revised: 02/12/2023] [Accepted: 05/26/2023] [Indexed: 06/24/2023]
Abstract
The gut microbiome modulates immune and metabolic health. Human microbiome data are biased toward industrialized populations, limiting our understanding of non-industrialized microbiomes. Here, we performed ultra-deep metagenomic sequencing on 351 fecal samples from the Hadza hunter-gatherers of Tanzania and comparative populations in Nepal and California. We recovered 91,662 genomes of bacteria, archaea, bacteriophages, and eukaryotes, 44% of which are absent from existing unified datasets. We identified 124 gut-resident species vanishing in industrialized populations and highlighted distinct aspects of the Hadza gut microbiome related to in situ replication rates, signatures of selection, and strain sharing. Industrialized gut microbes were found to be enriched in genes associated with oxidative stress, possibly a result of microbiome adaptation to inflammatory processes. This unparalleled view of the Hadza gut microbiome provides a valuable resource, expands our understanding of microbes capable of colonizing the human gut, and clarifies the extensive perturbation induced by the industrialized lifestyle.
Collapse
Affiliation(s)
- Matthew M Carter
- Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, CA 94304, USA
| | - Matthew R Olm
- Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, CA 94304, USA
| | - Bryan D Merrill
- Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, CA 94304, USA
| | - Dylan Dahan
- Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, CA 94304, USA
| | - Surya Tripathi
- Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, CA 94304, USA
| | - Sean P Spencer
- Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, CA 94304, USA
| | - Feiqiao B Yu
- Chan Zuckerberg Biohub, San Francisco, CA 94158, USA
| | - Sunit Jain
- Chan Zuckerberg Biohub, San Francisco, CA 94158, USA
| | - Norma Neff
- Chan Zuckerberg Biohub, San Francisco, CA 94158, USA
| | - Aashish R Jha
- Genetic Heritage Group, Program in Biology, New York University Abu Dhabi, Abu Dhabi, United Arab Emirates
| | - Erica D Sonnenburg
- Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, CA 94304, USA.
| | - Justin L Sonnenburg
- Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, CA 94304, USA; Chan Zuckerberg Biohub, San Francisco, CA 94158, USA; Center for Human Microbiome Studies, Stanford University School of Medicine, Stanford, CA 94304, USA.
| |
Collapse
|
7
|
Dosch J, Bergmann H, Tran V, Ebersberger I. FAS: assessing the similarity between proteins using multi-layered feature architectures. Bioinformatics 2023; 39:btad226. [PMID: 37084276 PMCID: PMC10185405 DOI: 10.1093/bioinformatics/btad226] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 02/23/2023] [Accepted: 04/13/2023] [Indexed: 04/23/2023] Open
Abstract
MOTIVATION Protein sequence comparison is a fundamental element in the bioinformatics toolkit. When sequences are annotated with features such as functional domains, transmembrane domains, low complexity regions or secondary structure elements, the resulting feature architectures allow better informed comparisons. However, many existing schemes for scoring architecture similarities cannot cope with features arising from multiple annotation sources. Those that do fall short in the resolution of overlapping and redundant feature annotations. RESULTS Here, we introduce FAS, a scoring method that integrates features from multiple annotation sources in a directed acyclic architecture graph. Redundancies are resolved as part of the architecture comparison by finding the paths through the graphs that maximize the pair-wise architecture similarity. In a large-scale evaluation on more than 10 000 human-yeast ortholog pairs, architecture similarities assessed with FAS are consistently more plausible than those obtained using e-values to resolve overlaps or leaving overlaps unresolved. Three case studies demonstrate the utility of FAS on architecture comparison tasks: benchmarking of orthology assignment software, identification of functionally diverged orthologs, and diagnosing protein architecture changes stemming from faulty gene predictions. With the help of FAS, feature architecture comparisons can now be routinely integrated into these and many other applications. AVAILABILITY AND IMPLEMENTATION FAS is available as python package: https://pypi.org/project/greedyFAS/.
Collapse
Affiliation(s)
- Julian Dosch
- Applied Bioinformatics Group, Goethe University Frankfurt, Faculty of Biosciences, Institute of Cell Biology and Neuroscience, Frankfurt, 60438, Germany
| | - Holger Bergmann
- Applied Bioinformatics Group, Goethe University Frankfurt, Faculty of Biosciences, Institute of Cell Biology and Neuroscience, Frankfurt, 60438, Germany
| | - Vinh Tran
- Applied Bioinformatics Group, Goethe University Frankfurt, Faculty of Biosciences, Institute of Cell Biology and Neuroscience, Frankfurt, 60438, Germany
| | - Ingo Ebersberger
- Applied Bioinformatics Group, Goethe University Frankfurt, Faculty of Biosciences, Institute of Cell Biology and Neuroscience, Frankfurt, 60438, Germany
- Senckenberg Biodiversity and Climate Research Centre (S-BIKF), Frankfurt, 60325, Germany
- LOEWE Centre for Translational Biodiversity Genomics (TBG), Frankfurt, 60325, Germany
| |
Collapse
|
8
|
Rios-Martinez C, Bhattacharya N, Amini AP, Crawford L, Yang KK. Deep self-supervised learning for biosynthetic gene cluster detection and product classification. PLoS Comput Biol 2023; 19:e1011162. [PMID: 37220151 PMCID: PMC10241353 DOI: 10.1371/journal.pcbi.1011162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2022] [Revised: 06/05/2023] [Accepted: 05/07/2023] [Indexed: 05/25/2023] Open
Abstract
Natural products are chemical compounds that form the basis of many therapeutics used in the pharmaceutical industry. In microbes, natural products are synthesized by groups of colocalized genes called biosynthetic gene clusters (BGCs). With advances in high-throughput sequencing, there has been an increase of complete microbial isolate genomes and metagenomes, from which a vast number of BGCs are undiscovered. Here, we introduce a self-supervised learning approach designed to identify and characterize BGCs from such data. To do this, we represent BGCs as chains of functional protein domains and train a masked language model on these domains. We assess the ability of our approach to detect BGCs and characterize BGC properties in bacterial genomes. We also demonstrate that our model can learn meaningful representations of BGCs and their constituent domains, detect BGCs in microbial genomes, and predict BGC product classes. These results highlight self-supervised neural networks as a promising framework for improving BGC prediction and classification.
Collapse
Affiliation(s)
- Carolina Rios-Martinez
- Microsoft Research New England, Cambridge, Massachusetts, United States of America
- Department of Bioengineering, Stanford University, Stanford, California, United States of America
| | - Nicholas Bhattacharya
- Microsoft Research New England, Cambridge, Massachusetts, United States of America
- Department of Mathematics, University of California, Berkeley, Berkeley, California, United States of America
| | - Ava P. Amini
- Microsoft Research New England, Cambridge, Massachusetts, United States of America
| | - Lorin Crawford
- Microsoft Research New England, Cambridge, Massachusetts, United States of America
| | - Kevin K. Yang
- Microsoft Research New England, Cambridge, Massachusetts, United States of America
| |
Collapse
|
9
|
AlphaFold2 reveals commonalities and novelties in protein structure space for 21 model organisms. Commun Biol 2023; 6:160. [PMID: 36755055 PMCID: PMC9908985 DOI: 10.1038/s42003-023-04488-9] [Citation(s) in RCA: 29] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Accepted: 01/16/2023] [Indexed: 02/10/2023] Open
Abstract
Deep-learning (DL) methods like DeepMind's AlphaFold2 (AF2) have led to substantial improvements in protein structure prediction. We analyse confident AF2 models from 21 model organisms using a new classification protocol (CATH-Assign) which exploits novel DL methods for structural comparison and classification. Of ~370,000 confident models, 92% can be assigned to 3253 superfamilies in our CATH domain superfamily classification. The remaining cluster into 2367 putative novel superfamilies. Detailed manual analysis on 618 of these, having at least one human relative, reveal extremely remote homologies and further unusual features. Only 25 novel superfamilies could be confirmed. Although most models map to existing superfamilies, AF2 domains expand CATH by 67% and increases the number of unique 'global' folds by 36% and will provide valuable insights on structure function relationships. CATH-Assign will harness the huge expansion in structural data provided by DeepMind to rationalise evolutionary changes driving functional divergence.
Collapse
|
10
|
Adeyelu T, Bordin N, Waman VP, Sadlej M, Sillitoe I, Moya-Garcia AA, Orengo CA. KinFams: De-Novo Classification of Protein Kinases Using CATH Functional Units. Biomolecules 2023; 13:277. [PMID: 36830646 PMCID: PMC9953599 DOI: 10.3390/biom13020277] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Revised: 01/24/2023] [Accepted: 01/26/2023] [Indexed: 02/05/2023] Open
Abstract
Protein kinases are important targets for treating human disorders, and they are the second most targeted families after G-protein coupled receptors. Several resources provide classification of kinases into evolutionary families (based on sequence homology); however, very few systematically classify functional families (FunFams) comprising evolutionary relatives that share similar functional properties. We have developed the FunFam-MARC (Multidomain ARchitecture-based Clustering) protocol, which uses multi-domain architectures of protein kinases and specificity-determining residues for functional family classification. FunFam-MARC predicts 2210 kinase functional families (KinFams), which have increased functional coherence, in terms of EC annotations, compared to the widely used KinBase classification. Our protocol provides a comprehensive classification for kinase sequences from >10,000 organisms. We associate human KinFams with diseases and drugs and identify 28 druggable human KinFams, i.e., enriched in clinically approved drugs. Since relatives in the same druggable KinFam tend to be structurally conserved, including the drug-binding site, these KinFams may be valuable for shortlisting therapeutic targets. Information on the human KinFams and associated 3D structures from AlphaFold2 are provided via our CATH FTP website and Zenodo. This gives the domain structure representative of each KinFam together with information on any drug compounds available. For 32% of the KinFams, we provide information on highly conserved residue sites that may be associated with specificity.
Collapse
Affiliation(s)
- Tolulope Adeyelu
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
- Department of Comparative Biomedical Science, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Nicola Bordin
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Vaishali P. Waman
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Marta Sadlej
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Aurelio A. Moya-Garcia
- Departamento de Biología Molecular y Bioquímica, Universidad de Málaga, 29071 Málaga, Spain
- Laboratorio de Biología Molecular del Cáncer, Centro de Investigaciones Médico-Sanitarias (CIMES), Universidad de Málaga, 29071 Málaga, Spain
| | - Christine A. Orengo
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| |
Collapse
|
11
|
Sen N, Madhusudhan MS. A structural database of chain–chain and domain–domain interfaces of proteins. Protein Sci 2022. [DOI: 10.1002/pro.4406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Neeladri Sen
- Indian Institute of Science Education and Research Pune India
- Institute of Structural and Molecular Biology University College London London UK
| | | |
Collapse
|
12
|
Romei M, Sapriel G, Imbert P, Jamay T, Chomilier J, Lecointre G, Carpentier M. Protein folds as synapomorphies of the tree of life. Evolution 2022; 76:1706-1719. [PMID: 35765784 PMCID: PMC9541633 DOI: 10.1111/evo.14550] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Revised: 05/17/2022] [Accepted: 05/31/2022] [Indexed: 01/22/2023]
Abstract
Several studies showed that folds (topology of protein secondary structures) distribution in proteomes may be a global proxy to build phylogeny. Then, some folds should be synapomorphies (derived characters exclusively shared among taxa). However, previous studies used methods that did not allow synapomorphy identification, which requires congruence analysis of folds as individual characters. Here, we map SCOP folds onto a sample of 210 species across the tree of life (TOL). Congruence is assessed using retention index of each fold for the TOL, and principal component analysis for deeper branches. Using a bicluster mapping approach, we define synapomorphic blocks of folds (SBF) sharing similar presence/absence patterns. Among the 1232 folds, 20% are universally present in our TOL, whereas 54% are reliable synapomorphies. These results are similar with CATH and ECOD databases. Eukaryotes are characterized by a large number of them, and several SBFs clearly support nested eukaryotic clades (divergence times from 1100 to 380 mya). Although clearly separated, the three superkingdoms reveal a strong mosaic pattern. This pattern is consistent with the dual origin of eukaryotes and witness secondary endosymbiosis in their phothosynthetic clades. Our study unveils direct analysis of folds synapomorphies as key characters to unravel evolutionary history of species.
Collapse
Affiliation(s)
- Martin Romei
- Institut Systématique Evolution Biodiversité (ISYEB UMR 7205)Sorbonne Université, MNHN, CNRS, EPHE, UAParisFrance,IMPMC (UMR 7590), BiBiP, Sorbonne Université, CNRS, MNHNParisFrance
| | - Guillaume Sapriel
- Institut Systématique Evolution Biodiversité (ISYEB UMR 7205)Sorbonne Université, MNHN, CNRS, EPHE, UAParisFrance,UFR des sciences de la santéUniversité Versailles‐St‐QuentinVersaillesFrance
| | - Pierre Imbert
- Institut Systématique Evolution Biodiversité (ISYEB UMR 7205)Sorbonne Université, MNHN, CNRS, EPHE, UAParisFrance
| | - Théo Jamay
- Institut Systématique Evolution Biodiversité (ISYEB UMR 7205)Sorbonne Université, MNHN, CNRS, EPHE, UAParisFrance
| | | | - Guillaume Lecointre
- Institut Systématique Evolution Biodiversité (ISYEB UMR 7205)Sorbonne Université, MNHN, CNRS, EPHE, UAParisFrance
| | - Mathilde Carpentier
- Institut Systématique Evolution Biodiversité (ISYEB UMR 7205)Sorbonne Université, MNHN, CNRS, EPHE, UAParisFrance
| |
Collapse
|
13
|
Sen N, Anishchenko I, Bordin N, Sillitoe I, Velankar S, Baker D, Orengo C. Characterizing and explaining the impact of disease-associated mutations in proteins without known structures or structural homologs. Brief Bioinform 2022; 23:bbac187. [PMID: 35641150 PMCID: PMC9294430 DOI: 10.1093/bib/bbac187] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2021] [Revised: 04/23/2022] [Accepted: 04/27/2022] [Indexed: 12/12/2022] Open
Abstract
Mutations in human proteins lead to diseases. The structure of these proteins can help understand the mechanism of such diseases and develop therapeutics against them. With improved deep learning techniques, such as RoseTTAFold and AlphaFold, we can predict the structure of proteins even in the absence of structural homologs. We modeled and extracted the domains from 553 disease-associated human proteins without known protein structures or close homologs in the Protein Databank. We noticed that the model quality was higher and the Root mean square deviation (RMSD) lower between AlphaFold and RoseTTAFold models for domains that could be assigned to CATH families as compared to those which could only be assigned to Pfam families of unknown structure or could not be assigned to either. We predicted ligand-binding sites, protein-protein interfaces and conserved residues in these predicted structures. We then explored whether the disease-associated missense mutations were in the proximity of these predicted functional sites, whether they destabilized the protein structure based on ddG calculations or whether they were predicted to be pathogenic. We could explain 80% of these disease-associated mutations based on proximity to functional sites, structural destabilization or pathogenicity. When compared to polymorphisms, a larger percentage of disease-associated missense mutations were buried, closer to predicted functional sites, predicted as destabilizing and pathogenic. Usage of models from the two state-of-the-art techniques provide better confidence in our predictions, and we explain 93 additional mutations based on RoseTTAFold models which could not be explained based solely on AlphaFold models.
Collapse
Affiliation(s)
- Neeladri Sen
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Ivan Anishchenko
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98195, USA
| | - Nicola Bordin
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Sameer Velankar
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - David Baker
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98195, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| |
Collapse
|
14
|
Merrill BD, Carter MM, Olm MR, Dahan D, Tripathi S, Spencer SP, Yu B, Jain S, Neff N, Jha AR, Sonnenburg ED, Sonnenburg JL. Ultra-deep Sequencing of Hadza Hunter-Gatherers Recovers Vanishing Microbes.. [PMID: 36238714 PMCID: PMC9558438 DOI: 10.1101/2022.03.30.486478] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
The gut microbiome is a key modulator of immune and metabolic health. Human microbiome data is biased towards industrialized populations, providing limited understanding of the distinct and diverse non-industrialized microbiomes. Here, we performed ultra-deep metagenomic sequencing and strain cultivation on 351 fecal samples from the Hadza, hunter-gatherers in Tanzania, and comparative populations in Nepal and California. We recover 94,971 total genomes of bacteria, archaea, bacteriophages, and eukaryotes, 43% of which are absent from existing unified datasets. Analysis of in situ growth rates, genetic pN/pS signatures, high-resolution strain tracking, and 124 gut-resident species vanishing in industrialized populations reveals differentiating dynamics of the Hadza gut microbiome. Industrialized gut microbes are enriched in genes associated with oxidative stress, possibly a result of microbiome adaptation to inflammatory processes. This unparalleled view of the Hadza gut microbiome provides a valuable resource that expands our understanding of microbes capable of colonizing the human gut and clarifies the extensive perturbation brought on by the industrialized lifestyle.
Collapse
|
15
|
Chen LX, Jaffe AL, Borges AL, Penev PI, Nelson TC, Warren LA, Banfield JF. Phage-encoded ribosomal protein S21 expression is linked to late-stage phage replication. ISME COMMUNICATIONS 2022; 2:31. [PMID: 37938675 PMCID: PMC9723584 DOI: 10.1038/s43705-022-00111-w] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Revised: 01/31/2022] [Accepted: 02/03/2022] [Indexed: 06/16/2023]
Abstract
The ribosomal protein S21 (bS21) gene has been detected in diverse viruses with a large range of genome sizes, yet its in situ expression and potential significance have not been investigated. Here, we report five closely related clades of bacteriophages (phages) represented by 47 genomes (8 curated to completion and up to 331 kbp in length) that encode a bS21 gene. The bS21 gene is on the reverse strand within a conserved region that encodes the large terminase, major capsid protein, prohead protease, portal vertex proteins, and some hypothetical proteins. Based on CRISPR spacer targeting, the predominance of bacterial taxonomic affiliations of phage genes with those from Bacteroidetes, and the high sequence similarity of the phage bS21 genes and those from Bacteroidetes classes of Flavobacteriia, Cytophagia and Saprospiria, these phages are predicted to infect diverse Bacteroidetes species that inhabit a range of depths in freshwater lakes. Thus, bS21 phages have the potential to impact microbial community composition and carbon turnover in lake ecosystems. The transcriptionally active bS21-encoding phages were likely in the late stage of replication when collected, as core structural genes and bS21 were highly expressed. Thus, our analyses suggest that the phage bS21, which is involved in translation initiation, substitutes into the Bacteroidetes ribosomes and selects preferentially for phage transcripts during the late-stage replication when large-scale phage protein production is required for assembly of phage particles.
Collapse
Affiliation(s)
- Lin-Xing Chen
- Department of Earth and Planetary Science, University of California, Berkeley, CA, USA
- Innovative Genomics Institute, University of California, Berkeley, CA, USA
| | - Alexander L Jaffe
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
| | - Adair L Borges
- Innovative Genomics Institute, University of California, Berkeley, CA, USA
| | - Petar I Penev
- Innovative Genomics Institute, University of California, Berkeley, CA, USA
| | | | - Lesley A Warren
- Department of Civil and Mineral Engineering, University of Toronto, Toronto, ON, Canada
| | - Jillian F Banfield
- Department of Earth and Planetary Science, University of California, Berkeley, CA, USA.
- Innovative Genomics Institute, University of California, Berkeley, CA, USA.
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA.
- Department of Environmental Science, Policy, and Management, University of California, Berkeley, CA, USA.
- Chan Zuckerberg Biohub, San Francisco, CA, USA.
| |
Collapse
|
16
|
Exploiting protein family and protein network data to identify novel drug targets for bladder cancer. Oncotarget 2022; 13:105-117. [PMID: 35035776 PMCID: PMC8758182 DOI: 10.18632/oncotarget.28175] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Accepted: 12/08/2021] [Indexed: 12/11/2022] Open
Abstract
Bladder cancer remains one of the most common forms of cancer and yet there are limited small molecule targeted therapies. Here, we present a computational platform to identify new potential targets for bladder cancer therapy. Our method initially exploited a set of known driver genes for bladder cancer combined with predicted bladder cancer genes from mutationally enriched protein domain families. We enriched this initial set of genes using protein network data to identify a comprehensive set of 323 putative bladder cancer targets. Pathway and cancer hallmarks analyses highlighted putative mechanisms in agreement with those previously reported for this cancer and revealed protein network modules highly enriched in potential drivers likely to be good targets for targeted therapies. 21 of our potential drug targets are targeted by FDA approved drugs for other diseases — some of them are known drivers or are already being targeted for bladder cancer (FGFR3, ERBB3, HDAC3, EGFR). A further 4 potential drug targets were identified by inheriting drug mappings across our in-house CATH domain functional families (FunFams). Our FunFam data also allowed us to identify drug targets in families that are less prone to side effects i.e., where structurally similar protein domain relatives are less dispersed across the human protein network. We provide information on our novel potential cancer driver genes, together with information on pathways, network modules and hallmarks associated with the predicted and known bladder cancer drivers and we highlight those drivers we predict to be likely drug targets.
Collapse
|
17
|
Lou YC, Olm MR, Diamond S, Crits-Christoph A, Firek BA, Baker R, Morowitz MJ, Banfield JF. Infant gut strain persistence is associated with maternal origin, phylogeny, and traits including surface adhesion and iron acquisition. Cell Rep Med 2021; 2:100393. [PMID: 34622230 PMCID: PMC8484513 DOI: 10.1016/j.xcrm.2021.100393] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2021] [Revised: 05/11/2021] [Accepted: 08/11/2021] [Indexed: 12/24/2022]
Abstract
Gut microbiome succession affects infant development. However, it remains unclear what factors promote persistence of initial bacterial colonizers in the developing gut. Here, we perform strain-resolved analyses to compare gut colonization of preterm and full-term infants throughout the first year of life and evaluate associations between strain persistence and strain origin as well as genetic potential. Analysis of fecal metagenomes collected from 13 full-term and 9 preterm infants reveals that infants' initially distinct microbiomes converge by age 1 year. Approximately 11% of early colonizers, primarily Bacteroides and Bifidobacterium, persist during the first year of life, and those are more prevalent in full-term, compared with preterm infants. Examination of 17 mother-infant pairs reveals maternal gut strains are significantly more likely to persist in the infant gut than other strains. Enrichment in genes for surface adhesion, iron acquisition, and carbohydrate degradation may explain persistence of some strains through the first year of life.
Collapse
Affiliation(s)
- Yue Clare Lou
- Department of Plant and Microbial Biology, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Matthew R. Olm
- Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Spencer Diamond
- Department of Earth and Planetary Science, University of California, Berkeley, CA 94709, USA
| | - Alexander Crits-Christoph
- Department of Plant and Microbial Biology, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Brian A. Firek
- Department of Surgery, University of Pittsburgh School of Medicine, Pittsburgh, PA 15213, USA
| | - Robyn Baker
- Department of Surgery, University of Pittsburgh School of Medicine, Pittsburgh, PA 15213, USA
| | - Michael J. Morowitz
- Department of Surgery, University of Pittsburgh School of Medicine, Pittsburgh, PA 15213, USA
| | - Jillian F. Banfield
- Department of Earth and Planetary Science, University of California, Berkeley, CA 94709, USA
- Department of Environmental Science, Policy, and Management, University of California, Berkeley, Berkeley, CA 94720, USA
- Earth Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94705, USA
- Chan Zuckerberg Biohub, San Francisco, CA 94158, USA
| |
Collapse
|
18
|
O’Donoghue SI, Schafferhans A, Sikta N, Stolte C, Kaur S, Ho BK, Anderson S, Procter JB, Dallago C, Bordin N, Adcock M, Rost B. SARS-CoV-2 structural coverage map reveals viral protein assembly, mimicry, and hijacking mechanisms. Mol Syst Biol 2021; 17:e10079. [PMID: 34519429 PMCID: PMC8438690 DOI: 10.15252/msb.202010079] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2020] [Revised: 08/05/2021] [Accepted: 08/06/2021] [Indexed: 01/18/2023] Open
Abstract
We modeled 3D structures of all SARS-CoV-2 proteins, generating 2,060 models that span 69% of the viral proteome and provide details not available elsewhere. We found that ˜6% of the proteome mimicked human proteins, while ˜7% was implicated in hijacking mechanisms that reverse post-translational modifications, block host translation, and disable host defenses; a further ˜29% self-assembled into heteromeric states that provided insight into how the viral replication and translation complex forms. To make these 3D models more accessible, we devised a structural coverage map, a novel visualization method to show what is-and is not-known about the 3D structure of the viral proteome. We integrated the coverage map into an accompanying online resource (https://aquaria.ws/covid) that can be used to find and explore models corresponding to the 79 structural states identified in this work. The resulting Aquaria-COVID resource helps scientists use emerging structural data to understand the mechanisms underlying coronavirus infection and draws attention to the 31% of the viral proteome that remains structurally unknown or dark.
Collapse
MESH Headings
- Amino Acid Transport Systems, Neutral/chemistry
- Amino Acid Transport Systems, Neutral/genetics
- Amino Acid Transport Systems, Neutral/metabolism
- Angiotensin-Converting Enzyme 2/chemistry
- Angiotensin-Converting Enzyme 2/genetics
- Angiotensin-Converting Enzyme 2/metabolism
- Binding Sites
- COVID-19/genetics
- COVID-19/metabolism
- COVID-19/virology
- Computational Biology/methods
- Coronavirus Envelope Proteins/chemistry
- Coronavirus Envelope Proteins/genetics
- Coronavirus Envelope Proteins/metabolism
- Coronavirus Nucleocapsid Proteins/chemistry
- Coronavirus Nucleocapsid Proteins/genetics
- Coronavirus Nucleocapsid Proteins/metabolism
- Host-Pathogen Interactions/genetics
- Humans
- Mitochondrial Membrane Transport Proteins/chemistry
- Mitochondrial Membrane Transport Proteins/genetics
- Mitochondrial Membrane Transport Proteins/metabolism
- Mitochondrial Precursor Protein Import Complex Proteins
- Models, Molecular
- Molecular Mimicry
- Neuropilin-1/chemistry
- Neuropilin-1/genetics
- Neuropilin-1/metabolism
- Phosphoproteins/chemistry
- Phosphoproteins/genetics
- Phosphoproteins/metabolism
- Protein Binding
- Protein Conformation, alpha-Helical
- Protein Conformation, beta-Strand
- Protein Interaction Domains and Motifs
- Protein Interaction Mapping/methods
- Protein Multimerization
- Protein Processing, Post-Translational
- SARS-CoV-2/chemistry
- SARS-CoV-2/genetics
- SARS-CoV-2/metabolism
- Spike Glycoprotein, Coronavirus/chemistry
- Spike Glycoprotein, Coronavirus/genetics
- Spike Glycoprotein, Coronavirus/metabolism
- Viral Matrix Proteins/chemistry
- Viral Matrix Proteins/genetics
- Viral Matrix Proteins/metabolism
- Viroporin Proteins/chemistry
- Viroporin Proteins/genetics
- Viroporin Proteins/metabolism
- Virus Replication
Collapse
Affiliation(s)
- Seán I O’Donoghue
- Garvan Institute of Medical ResearchDarlinghurstNSWAustralia
- CSIRO Data61CanberraACTAustralia
- School of Biotechnology and Biomolecular Sciences (UNSW)KensingtonNSWAustralia
| | - Andrea Schafferhans
- Garvan Institute of Medical ResearchDarlinghurstNSWAustralia
- Department of Bioengineering SciencesWeihenstephan‐Tr. University of Applied SciencesFreisingGermany
- Department of InformaticsBioinformatics & Computational BiologyTechnical University of MunichMunichGermany
| | - Neblina Sikta
- Garvan Institute of Medical ResearchDarlinghurstNSWAustralia
| | | | - Sandeep Kaur
- Garvan Institute of Medical ResearchDarlinghurstNSWAustralia
- School of Biotechnology and Biomolecular Sciences (UNSW)KensingtonNSWAustralia
| | - Bosco K Ho
- Garvan Institute of Medical ResearchDarlinghurstNSWAustralia
| | | | | | - Christian Dallago
- Department of InformaticsBioinformatics & Computational BiologyTechnical University of MunichMunichGermany
| | - Nicola Bordin
- Institute of Structural and Molecular BiologyUniversity College LondonLondonUK
| | | | - Burkhard Rost
- Department of InformaticsBioinformatics & Computational BiologyTechnical University of MunichMunichGermany
| |
Collapse
|
19
|
Das S, Scholes HM, Sen N, Orengo C. CATH functional families predict functional sites in proteins. Bioinformatics 2021; 37:1099-1106. [PMID: 33135053 PMCID: PMC8150129 DOI: 10.1093/bioinformatics/btaa937] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2020] [Revised: 09/30/2020] [Accepted: 10/27/2020] [Indexed: 01/12/2023] Open
Abstract
MOTIVATION Identification of functional sites in proteins is essential for functional characterization, variant interpretation and drug design. Several methods are available for predicting either a generic functional site, or specific types of functional site. Here, we present FunSite, a machine learning predictor that identifies catalytic, ligand-binding and protein-protein interaction functional sites using features derived from protein sequence and structure, and evolutionary data from CATH functional families (FunFams). RESULTS FunSite's prediction performance was rigorously benchmarked using cross-validation and a holdout dataset. FunSite outperformed other publicly available functional site prediction methods. We show that conserved residues in FunFams are enriched in functional sites. We found FunSite's performance depends greatly on the quality of functional site annotations and the information content of FunFams in the training data. Finally, we analyze which structural and evolutionary features are most predictive for functional sites. AVAILABILITYAND IMPLEMENTATION https://github.com/UCL/cath-funsite-predictor. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sayoni Das
- PrecisionLife Ltd., Long Hanborough, OX29 8LJ Oxford, UK
| | - Harry M Scholes
- Institute of Structural and Molecular Biology, University College London, WC1E 6BT, London, UK
| | - Neeladri Sen
- Institute of Structural and Molecular Biology, University College London, WC1E 6BT, London, UK
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College London, WC1E 6BT, London, UK
| |
Collapse
|
20
|
Crits-Christoph A, Bhattacharya N, Olm MR, Song YS, Banfield JF. Transporter genes in biosynthetic gene clusters predict metabolite characteristics and siderophore activity. Genome Res 2021; 31:239-250. [PMID: 33361114 PMCID: PMC7849407 DOI: 10.1101/gr.268169.120] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2020] [Accepted: 12/16/2020] [Indexed: 12/27/2022]
Abstract
Biosynthetic gene clusters (BGCs) are operonic sets of microbial genes that synthesize specialized metabolites with diverse functions, including siderophores and antibiotics, which often require export to the extracellular environment. For this reason, genes for transport across cellular membranes are essential for the production of specialized metabolites and are often genomically colocalized with BGCs. Here, we conducted a comprehensive computational analysis of transporters associated with characterized BGCs. In addition to known exporters, in BGCs we found many importer-specific transmembrane domains that co-occur with substrate binding proteins possibly for uptake of siderophores or metabolic precursors. Machine learning models using transporter gene frequencies were predictive of known siderophore activity, molecular weights, and a measure of lipophilicity (log P) for corresponding BGC-synthesized metabolites. Transporter genes associated with BGCs were often equally or more predictive of metabolite features than biosynthetic genes. Given the importance of siderophores as pathogenicity factors, we used transporters specific for siderophore BGCs to identify both known and uncharacterized siderophore-like BGCs in genomes from metagenomes from the infant and adult gut microbiome. We find that 23% of microbial genomes from premature infant guts have siderophore-like BGCs, but only 3% of those assembled from adult gut microbiomes do. Although siderophore-like BGCs from the infant gut are predominantly associated with Enterobacteriaceae and Staphylococcus, siderophore-like BGCs can be identified from taxa in the adult gut microbiome that have rarely been recognized for siderophore production. Taken together, these results show that consideration of BGC-associated transporter genes can inform predictions of specialized metabolite structure and function.
Collapse
Affiliation(s)
- Alexander Crits-Christoph
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
- Innovative Genomics Institute, Berkeley, California 94720, USA
| | - Nicholas Bhattacharya
- Department of Mathematics, University of California, Berkeley, California 94720, USA
| | - Matthew R Olm
- Department of Microbiology and Immunology, Stanford University, California 94305, USA
| | - Yun S Song
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, California 94720, USA
- Department of Statistics, University of California, Berkeley, California 94720, USA
- Chan Zuckerberg Biohub, San Francisco, California 94158, USA
| | - Jillian F Banfield
- Innovative Genomics Institute, Berkeley, California 94720, USA
- Department of Microbiology and Immunology, Stanford University, California 94305, USA
- Chan Zuckerberg Biohub, San Francisco, California 94158, USA
- Department of Environmental Science, Policy, and Management, University of California, Berkeley, California 94720, USA
- Earth Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
| |
Collapse
|
21
|
Sillitoe I, Bordin N, Dawson N, Waman VP, Ashford P, Scholes HM, Pang CSM, Woodridge L, Rauer C, Sen N, Abbasian M, Le Cornu S, Lam SD, Berka K, Varekova I, Svobodova R, Lees J, Orengo CA. CATH: increased structural coverage of functional space. Nucleic Acids Res 2021; 49:D266-D273. [PMID: 33237325 PMCID: PMC7778904 DOI: 10.1093/nar/gkaa1079] [Citation(s) in RCA: 199] [Impact Index Per Article: 66.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2020] [Revised: 10/20/2020] [Accepted: 11/02/2020] [Indexed: 12/11/2022] Open
Abstract
CATH (https://www.cathdb.info) identifies domains in protein structures from wwPDB and classifies these into evolutionary superfamilies, thereby providing structural and functional annotations. There are two levels: CATH-B, a daily snapshot of the latest domain structures and superfamily assignments, and CATH+, with additional derived data, such as predicted sequence domains, and functionally coherent sequence subsets (Functional Families or FunFams). The latest CATH+ release, version 4.3, significantly increases coverage of structural and sequence data, with an addition of 65,351 fully-classified domains structures (+15%), providing 500 238 structural domains, and 151 million predicted sequence domains (+59%) assigned to 5481 superfamilies. The FunFam generation pipeline has been re-engineered to cope with the increased influx of data. Three times more sequences are captured in FunFams, with a concomitant increase in functional purity, information content and structural coverage. FunFam expansion increases the structural annotations provided for experimental GO terms (+59%). We also present CATH-FunVar web-pages displaying variations in protein sequences and their proximity to known or predicted functional sites. We present two case studies (1) putative cancer drivers and (2) SARS-CoV-2 proteins. Finally, we have improved links to and from CATH including SCOP, InterPro, Aquaria and 2DProt.
Collapse
Affiliation(s)
- Ian Sillitoe
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Nicola Bordin
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Natalie Dawson
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Vaishali P Waman
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Paul Ashford
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Harry M Scholes
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Camilla S M Pang
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Laurel Woodridge
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Clemens Rauer
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Neeladri Sen
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Mahnaz Abbasian
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Sean Le Cornu
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Su Datt Lam
- Department of Applied Physics, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, Bangi, Selangor 43600, Malaysia
| | - Karel Berka
- Regional Centre of Advanced Technologies and Materials, Department of Physical Chemistry, Faculty of Science, Palacký University Olomouc, Olomouc 771 46, Czech Republic
| | - Ivana Hutařová Varekova
- National Centre for Biomolecular Research, Faculty of Science, Masaryk University, Brno 602 00, Czech Republic
| | - Radka Svobodova
- Central European Institute of Technology, Masaryk University, Brno 625 00, Czech Republic| National Centre for Biomolecular Research, Faculty of Science, Masaryk University, Brno 602 00, Czech Republic
| | - Jon Lees
- Department of Biological and Medical Sciences, Faculty of Health and Life Sciences, Oxford Brookes University, Oxford OX3 0BP, UK
| | - Christine A Orengo
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| |
Collapse
|
22
|
Lu L, Loker ES, Adema CM, Zhang SM, Bu L. Genomic and transcriptional analysis of genes containing fibrinogen and IgSF domains in the schistosome vector Biomphalaria glabrata, with emphasis on the differential responses of snails susceptible or resistant to Schistosoma mansoni. PLoS Negl Trop Dis 2020; 14:e0008780. [PMID: 33052953 PMCID: PMC7588048 DOI: 10.1371/journal.pntd.0008780] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2020] [Revised: 10/26/2020] [Accepted: 09/08/2020] [Indexed: 12/31/2022] Open
Abstract
Achieving a deeper understanding of the factors controlling the defense responses of invertebrate vectors to the human-infecting pathogens they transmit will provide needed new leads to pursue for control. Consequently, we provide new genomic and transcriptomic insights regarding FReDs (containing a fibrinogen domain) and FREPs (fibrinogen domain and one or two IgSF domains) from the planorbid snail Biomphalaria glabrata, a Neotropical vector of Schistosoma mansoni, causative agent of human intestinal schistosomiasis. Using new bioinformatics approaches to improve annotation applied to both genome and RNA-Seq data, we identify 73 FReD genes, 39 of which are FREPs. We provide details of domain structure and consider relationships and homologies of B. glabrata FBG and IgSF domains. We note that schistosome-resistant (BS-90) snails mount complex FREP responses following exposure to S. mansoni infection whereas schistosome-susceptible (M line) snails do not. We also identify several coding differences between BS-90 and M line snails in three FREPs (2, 3.1 and 3.2) repeatedly implicated in other studies of anti-schistosome responses. In combination with other results, our study provides a strong impetus to pursue particular FREPs (2, 3.1, 3.2 and 4) as candidate resistance factors to be considered more broadly with respect to schistosome control efforts, including involving other Biomphalaria species vectoring S. mansoni in endemic areas in Africa.
Collapse
Affiliation(s)
- Lijun Lu
- Center for Evolutionary and Theoretical Immunology (CETI), Department of Biology, University of New Mexico, Albuquerque, New Mexico, United States of America
| | - Eric S. Loker
- Center for Evolutionary and Theoretical Immunology (CETI), Department of Biology, University of New Mexico, Albuquerque, New Mexico, United States of America
| | - Coen M. Adema
- Center for Evolutionary and Theoretical Immunology (CETI), Department of Biology, University of New Mexico, Albuquerque, New Mexico, United States of America
| | - Si-Ming Zhang
- Center for Evolutionary and Theoretical Immunology (CETI), Department of Biology, University of New Mexico, Albuquerque, New Mexico, United States of America
| | - Lijing Bu
- Center for Evolutionary and Theoretical Immunology (CETI), Department of Biology, University of New Mexico, Albuquerque, New Mexico, United States of America
| |
Collapse
|