1
|
González Dalmasy JM, Fitzsimmons CM, Frye WJE, Perciaccante AJ, Jewell CP, Jenkins LM, Batista PJ, Robey RW, Gottesman MM. The thiol methyltransferase activity of TMT1A (METTL7A) is conserved across species. Chem Biol Interact 2024; 394:110989. [PMID: 38574836 PMCID: PMC11056289 DOI: 10.1016/j.cbi.2024.110989] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 03/10/2024] [Accepted: 04/02/2024] [Indexed: 04/06/2024]
Abstract
Although few resistance mechanisms for histone deacetylase inhibitors (HDACis) have been described, we recently demonstrated that TMT1A (formerly METTL7A) and TMT1B (formerly METTL7B) can mediate resistance to HDACis with a thiol as the zinc-binding group by methylating and inactivating the drug. TMT1A and TMT1B are poorly characterized, and their normal physiological role has yet to be determined. As animal model systems are often used to determine the physiological function of proteins, we investigated whether the ability of these methyltransferases to methylate thiol-based HDACis is conserved across different species. We found that TMT1A was conserved across rats, mice, chickens, and zebrafish, displaying 85.7%, 84.8%, 60.7%, and 51.0% amino acid sequence identity, respectively, with human TMT1A. Because TMT1B was not found in the chicken or zebrafish, we focused our studies on the TMT1A homologs. HEK-293 cells were transfected to express mouse, rat, chicken, or zebrafish homologs of TMT1A and all conferred resistance to the thiol-based HDACIs NCH-51, KD-5170, and romidepsin compared to empty vector-transfected cells. Additionally, all homologs blunted the downstream effects of HDACi treatment such as increased p21 expression, increased acetylated histone H3, and cell cycle arrest. Increased levels of dimethylated romidepsin were also found in the culture medium of cells transfected to express any of the TMT1A homologs after a 24 h incubation with romidepsin compared to empty-vector transfected cells. Our results indicate that the ability of TMT1A to methylate molecules is conserved across species. Animal models may therefore be useful in elucidating the role of these enzymes in humans.
Collapse
Affiliation(s)
- José M González Dalmasy
- Laboratory of Cell Biology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Christina M Fitzsimmons
- Laboratory of Cell Biology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - William J E Frye
- Laboratory of Cell Biology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Andrew J Perciaccante
- Laboratory of Cell Biology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Connor P Jewell
- Laboratory of Cell Biology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Lisa M Jenkins
- Laboratory of Cell Biology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Pedro J Batista
- Laboratory of Cell Biology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Robert W Robey
- Laboratory of Cell Biology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Michael M Gottesman
- Laboratory of Cell Biology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
2
|
González Dalmasy JM, Fitzsimmons CM, Frye WJ, Perciaccante AJ, Jewell CP, Jenkins LM, Batista PJ, Robey RW, Gottesman MM. The thiol methyltransferase activity of TMT1A (METTL7A) is conserved across species. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.17.567538. [PMID: 38076968 PMCID: PMC10705543 DOI: 10.1101/2023.11.17.567538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/22/2023]
Abstract
Although few resistance mechanisms for histone deacetylase inhibitors (HDACis) have been described, we recently demonstrated that TMT1A (formerly METTL7A) and TMT1B (formerly METTL7B) can mediate resistance to HDACis with a thiol as the zinc-binding group by methylating and inactivating the drug. TMT1A and TMT1B are poorly characterized, and their normal physiological role has yet to be determined. As animal model systems are often used to determine the physiological function of proteins, we investigated whether the ability of these methyltransferases to methylate thiol-based HDACis is conserved across different species. We found that TMT1A was conserved across rats, mice, chickens, and zebrafish, displaying 85.7%, 84.8%, 60.7% and 51.0% amino acid sequence identity, respectively, with human TMT1A. Because TMT1B was not found in the chicken or zebrafish, we focused our studies on the TMT1A homologs. HEK-293 cells were transfected to express mouse, rat, chicken, or zebrafish homologs of TMT1A and all conferred resistance to the thiol-based HDACIs NCH-51, KD-5170 and romidepsin compared to empty vector-transfected cells. Additionally, all homologs blunted the downstream effects of HDACi treatment such as increased p21 expression, increased acetylated histone H3, and cell cycle arrest. Increased levels of dimethylated romidepsin were also found in the culture medium of cells transfected to express any of the TMT1A homologs after a 24 h incubation with romidepsin compared to empty-vector transfected cells. Our results indicate that the ability of TMT1A to methylate molecules is conserved across species. Animal models may therefore be useful in elucidating the role of these enzymes in humans.
Collapse
Affiliation(s)
- José M. González Dalmasy
- Laboratory of Cell Biology, Center for Cancer Research, National Cancer Institute, National Institutes of Health. Bethesda, MD
| | - Christina M. Fitzsimmons
- Laboratory of Cell Biology, Center for Cancer Research, National Cancer Institute, National Institutes of Health. Bethesda, MD
| | - William J.E. Frye
- Laboratory of Cell Biology, Center for Cancer Research, National Cancer Institute, National Institutes of Health. Bethesda, MD
| | - Andrew J. Perciaccante
- Laboratory of Cell Biology, Center for Cancer Research, National Cancer Institute, National Institutes of Health. Bethesda, MD
| | - Connor P. Jewell
- Laboratory of Cell Biology, Center for Cancer Research, National Cancer Institute, National Institutes of Health. Bethesda, MD
| | - Lisa M. Jenkins
- Laboratory of Cell Biology, Center for Cancer Research, National Cancer Institute, National Institutes of Health. Bethesda, MD
| | - Pedro J. Batista
- Laboratory of Cell Biology, Center for Cancer Research, National Cancer Institute, National Institutes of Health. Bethesda, MD
| | - Robert W. Robey
- Laboratory of Cell Biology, Center for Cancer Research, National Cancer Institute, National Institutes of Health. Bethesda, MD
| | - Michael M. Gottesman
- Laboratory of Cell Biology, Center for Cancer Research, National Cancer Institute, National Institutes of Health. Bethesda, MD
| |
Collapse
|
3
|
Schütze K, Heinzinger M, Steinegger M, Rost B. Nearest neighbor search on embeddings rapidly identifies distant protein relations. FRONTIERS IN BIOINFORMATICS 2022; 2:1033775. [PMID: 36466147 PMCID: PMC9714024 DOI: 10.3389/fbinf.2022.1033775] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Accepted: 10/31/2022] [Indexed: 11/29/2023] Open
Abstract
Since 1992, all state-of-the-art methods for fast and sensitive identification of evolutionary, structural, and functional relations between proteins (also referred to as "homology detection") use sequences and sequence-profiles (PSSMs). Protein Language Models (pLMs) generalize sequences, possibly capturing the same constraints as PSSMs, e.g., through embeddings. Here, we explored how to use such embeddings for nearest neighbor searches to identify relations between protein pairs with diverged sequences (remote homology detection for levels of <20% pairwise sequence identity, PIDE). While this approach excelled for proteins with single domains, we demonstrated the current challenges applying this to multi-domain proteins and presented some ideas how to overcome existing limitations, in principle. We observed that sufficiently challenging data set separations were crucial to provide deeply relevant insights into the behavior of nearest neighbor search when applied to the protein embedding space, and made all our methods readily available for others.
Collapse
Affiliation(s)
- Konstantin Schütze
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology—i12, Munich, Germany
| | - Michael Heinzinger
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology—i12, Munich, Germany
- TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Garching, Germany
| | - Martin Steinegger
- School of Biological Sciences, Seoul National University, Seoul, South Korea
- Artificial Intelligence Institute, Seoul National University, Seoul, South Korea
| | - Burkhard Rost
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology—i12, Munich, Germany
- Institute for Advanced Study (TUM-IAS), Germany & TUM School of Life Sciences Weihenstephan (WZW), Freising, Germany
| |
Collapse
|
4
|
Heinzinger M, Littmann M, Sillitoe I, Bordin N, Orengo C, Rost B. Contrastive learning on protein embeddings enlightens midnight zone. NAR Genom Bioinform 2022; 4:lqac043. [PMID: 35702380 PMCID: PMC9188115 DOI: 10.1093/nargab/lqac043] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Revised: 03/25/2022] [Accepted: 05/17/2022] [Indexed: 12/23/2022] Open
Abstract
Experimental structures are leveraged through multiple sequence alignments, or more generally through homology-based inference (HBI), facilitating the transfer of information from a protein with known annotation to a query without any annotation. A recent alternative expands the concept of HBI from sequence-distance lookup to embedding-based annotation transfer (EAT). These embeddings are derived from protein Language Models (pLMs). Here, we introduce using single protein representations from pLMs for contrastive learning. This learning procedure creates a new set of embeddings that optimizes constraints captured by hierarchical classifications of protein 3D structures defined by the CATH resource. The approach, dubbed ProtTucker, has an improved ability to recognize distant homologous relationships than more traditional techniques such as threading or fold recognition. Thus, these embeddings have allowed sequence comparison to step into the 'midnight zone' of protein similarity, i.e. the region in which distantly related sequences have a seemingly random pairwise sequence similarity. The novelty of this work is in the particular combination of tools and sampling techniques that ascertained good performance comparable or better to existing state-of-the-art sequence comparison methods. Additionally, since this method does not need to generate alignments it is also orders of magnitudes faster. The code is available at https://github.com/Rostlab/EAT.
Collapse
Affiliation(s)
- Michael Heinzinger
- TUM (Technical University of Munich) Dept Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany
| | - Maria Littmann
- TUM (Technical University of Munich) Dept Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Nicola Bordin
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Burkhard Rost
- TUM (Technical University of Munich) Dept Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany
| |
Collapse
|
5
|
Juste C, Gérard P. Cholesterol-to-Coprostanol Conversion by the Gut Microbiota: What We Know, Suspect, and Ignore. Microorganisms 2021; 9:1881. [PMID: 34576776 PMCID: PMC8468837 DOI: 10.3390/microorganisms9091881] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 08/24/2021] [Accepted: 09/01/2021] [Indexed: 12/12/2022] Open
Abstract
Every day, up to 1 g of cholesterol, composed of the unabsorbed dietary cholesterol, the biliary cholesterol secretion, and cholesterol of cells sloughed from the intestinal epithelium, enters the colon. All cholesterol arriving in the large intestine can be metabolized by the colonic bacteria. Cholesterol is mainly converted into coprostanol, a non-absorbable sterol that is excreted in the feces. Interestingly, cholesterol-to-coprostanol conversion in human populations is variable, with a majority of high converters and a minority of low or inefficient converters. Two major pathways have been proposed, one involving the direct stereospecific reduction of the Δ5 double bond direct while the indirect pathway involves the intermediate formation of 4-cholelesten-3-one and coprostanone. Despite the fact that intestinal cholesterol conversion was discovered more than a century ago, only a few cholesterol-to-coprostanol-converting bacterial strains have been isolated and characterized. Moreover, the responsible genes were mainly unknown until recently. Interestingly, cholesterol-to-coprostanol conversion is highly regulated by the diet. Finally, this gut bacterial metabolism has been linked to health and disease, and recent evidence suggests it could contribute to lower blood cholesterol and cardiovascular risks.
Collapse
Affiliation(s)
| | - Philippe Gérard
- AgroParisTech, Micalis Institute, Université Paris-Saclay, INRAE, 78350 Jouy-en-Josas, France;
| |
Collapse
|
6
|
Spectrum of Protein Location in Proteomes Captures Evolutionary Relationship Between Species. J Mol Evol 2021; 89:544-553. [PMID: 34328525 PMCID: PMC8379119 DOI: 10.1007/s00239-021-10022-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2020] [Accepted: 07/16/2021] [Indexed: 11/10/2022]
Abstract
The native subcellular location (also referred to as localization or cellular compartment) of a protein is the one in which it acts most frequently; it is one aspect of protein function. Do ten eukaryotic model organisms differ in their location spectrum, i.e., the fraction of its proteome in each of seven major cellular compartments? As experimental annotations of locations remain biased and incomplete, we need prediction methods to answer this question. After systematic bias corrections, the complete but faulty prediction methods appeared to be more appropriate to compare location spectra between species than the incomplete more accurate experimental data. This work compared the location spectra for ten eukaryotes: Homo sapiens (human), Gorilla gorilla (gorilla), Pan troglodytes (chimpanzee), Mus musculus (mouse), Rattus norvegicus (rat), Drosophila melanogaster (fruit/vinegar fly), Anopheles gambiae (African malaria mosquito), Caenorhabitis elegans (nematode), Saccharomyces cerevisiae (baker’s yeast), and Schizosaccharomyces pombe (fission yeast). The two largest classes were predicted to be the nucleus and the cytoplasm together accounting for 47–62% of all proteins, while 7–21% of the proteins were predicted in the plasma membrane and 4–15% to be secreted. Overall, the predicted location spectra were largely similar. However, in detail, the differences sufficed to plot trees (UPGMA) and 2D (PCA) maps relating the ten organisms using a simple Euclidean distance in seven states (location classes). The relations based on the simple predicted location spectra captured aspects of cross-species comparisons usually revealed only by much more detailed evolutionary comparisons. Most interestingly, known phylogenetic relations were reproduced better by paralog-only than by ortholog-only trees.
Collapse
|
7
|
Hsin KT, Yang TJ, Lee YH, Cheng YS. Phylogenetic and Structural Analysis of NIN-Like Proteins With a Type I/II PB1 Domain That Regulates Oligomerization for Nitrate Response. FRONTIERS IN PLANT SCIENCE 2021; 12:672035. [PMID: 34135927 PMCID: PMC8200828 DOI: 10.3389/fpls.2021.672035] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/25/2021] [Accepted: 05/05/2021] [Indexed: 06/12/2023]
Abstract
Absorption of macronutrients such as nitrogen is a critical process for land plants. There is little information available on the correlation between the root evolution of land plants and the protein regulation of nitrogen absorption and responses. NIN-like protein (NLP) transcription factors contain a Phox and Bem1 (PB1) domain, which may regulate nitrate-response genes and seem to be involved in the adaptation to growing on land in terms of plant root development. In this report, we reveal the NLP phylogeny in land plants and the origin of NLP genes that may be involved in the nitrate-signaling pathway. Our NLP phylogeny showed that duplication of NLP genes occurred before divergence of chlorophyte and land plants. Duplicated NLP genes may lost in most chlorophyte lineages. The NLP genes of bryophytes were initially monophyletic, but this was followed by divergence of lycophyte NLP genes and then angiosperm NLP genes. Among those identified NLP genes, PB1, a protein-protein interaction domain was identified across our phylogeny. To understand how protein-protein interaction mediate via PB1 domain, we examined the PB1 domain of Arabidopsis thaliana NLP7 (AtNLP7) in terms of its molecular oligomerization and function as representative. Based on the structure of the PB1 domain, determined using small-angle x-ray scattering (SAXS) and site-directed mutagenesis, we found that the NLP7 PB1 protein forms oligomers and that several key residues (K867 and D909/D911/E913/D922 in the OPCA motif) play a pivotal role in the oligomerization of NLP7 proteins. The fact that these residues are all conserved across land plant lineages means that this oligomerization may have evolved after the common ancestor of extant land plants colonized the land. It would then have rapidly become established across land-plant lineages in order to mediate protein-protein interactions in the nitrate-signaling pathway.
Collapse
Affiliation(s)
- Kuan-Ting Hsin
- Department of Life Science, College of Life Science, National Taiwan University, Taipei, Taiwan
| | - Tzu-Jing Yang
- Institute of Biological Chemistry, Academia Sinica, Taipei, Taiwan
- Institute of Biochemical Sciences, College of Life Science, National Taiwan University, Taipei, Taiwan
- Institute of Plant Biology, College of Life Science, National Taiwan University, Taipei, Taiwan
| | - Yu-Hsuan Lee
- Department of Life Science, College of Life Science, National Taiwan University, Taipei, Taiwan
| | - Yi-Sheng Cheng
- Department of Life Science, College of Life Science, National Taiwan University, Taipei, Taiwan
- Institute of Plant Biology, College of Life Science, National Taiwan University, Taipei, Taiwan
- Genome and Systems Biology Degree Program, College of Life Science, National Taiwan University, Taipei, Taiwan
| |
Collapse
|
8
|
Bordin N, Sillitoe I, Lees JG, Orengo C. Tracing Evolution Through Protein Structures: Nature Captured in a Few Thousand Folds. Front Mol Biosci 2021; 8:668184. [PMID: 34041266 PMCID: PMC8141709 DOI: 10.3389/fmolb.2021.668184] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Accepted: 04/27/2021] [Indexed: 11/13/2022] Open
Abstract
This article is dedicated to the memory of Cyrus Chothia, who was a leading light in the world of protein structure evolution. His elegant analyses of protein families and their mechanisms of structural and functional evolution provided important evolutionary and biological insights and firmly established the value of structural perspectives. He was a mentor and supervisor to many other leading scientists who continued his quest to characterise structure and function space. He was also a generous and supportive colleague to those applying different approaches. In this article we review some of his accomplishments and the history of protein structure classifications, particularly SCOP and CATH. We also highlight some of the evolutionary insights these two classifications have brought. Finally, we discuss how the expansion and integration of protein sequence data into these structural families helps reveal the dark matter of function space and can inform the emergence of novel functions in Metazoa. Since we cover 25 years of structural classification, it has not been feasible to review all structure based evolutionary studies and hence we focus mainly on those undertaken by the SCOP and CATH groups and their collaborators.
Collapse
Affiliation(s)
- Nicola Bordin
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom
| | - Jonathan G Lees
- Department of Biological and Medical Sciences, Faculty of Health and Life Sciences, Oxford Brookes University, Oxford, United Kingdom
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom
| |
Collapse
|
9
|
Zohra Smaili F, Tian S, Roy A, Alazmi M, Arold ST, Mukherjee S, Scott Hefty P, Chen W, Gao X. QAUST: Protein Function Prediction Using Structure Similarity, Protein Interaction, and Functional Motifs. GENOMICS PROTEOMICS & BIOINFORMATICS 2021; 19:998-1011. [PMID: 33631427 PMCID: PMC9403031 DOI: 10.1016/j.gpb.2021.02.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/11/2018] [Revised: 04/03/2019] [Accepted: 05/17/2019] [Indexed: 11/25/2022]
Abstract
The number of available protein sequences in public databases is increasing exponentially. However, a significant percentage of these sequences lack functional annotation, which is essential for the understanding of how biological systems operate. Here, we propose a novel method, Quantitative Annotation of Unknown STructure (QAUST), to infer protein functions, specifically Gene Ontology (GO) terms and Enzyme Commission (EC) numbers. QAUST uses three sources of information: structure information encoded by global and local structure similarity search, biological network information inferred by protein–protein interaction data, and sequence information extracted from functionally discriminative sequence motifs. These three pieces of information are combined by consensus averaging to make the final prediction. Our approach has been tested on 500 protein targets from the Critical Assessment of Functional Annotation (CAFA) benchmark set. The results show that our method provides accurate functional annotation and outperforms other prediction methods based on sequence similarity search or threading. We further demonstrate that a previously unknown function of human tripartite motif-containing 22 (TRIM22) protein predicted by QAUST can be experimentally validated.
Collapse
Affiliation(s)
- Fatima Zohra Smaili
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia
| | - Shuye Tian
- Department of Biology, Southern University of Science and Technology of China (SUSTC), Shenzhen 518055, China
| | - Ambrish Roy
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Meshari Alazmi
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia; College of Computer Science and Engineering, University of Hail, Hail 55476, Saudi Arabia
| | - Stefan T Arold
- Biological and Environmental Sciences and Engineering (BESE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia
| | - Srayanta Mukherjee
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - P Scott Hefty
- Department of Molecular Bioscience, University of Kansas, Lawrence, KS 66047, USA
| | - Wei Chen
- Department of Biology, Southern University of Science and Technology of China (SUSTC), Shenzhen 518055, China.
| | - Xin Gao
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia.
| |
Collapse
|
10
|
Littmann M, Heinzinger M, Dallago C, Olenyi T, Rost B. Embeddings from deep learning transfer GO annotations beyond homology. Sci Rep 2021; 11:1160. [PMID: 33441905 PMCID: PMC7806674 DOI: 10.1038/s41598-020-80786-0] [Citation(s) in RCA: 58] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2020] [Accepted: 12/24/2020] [Indexed: 11/09/2022] Open
Abstract
Knowing protein function is crucial to advance molecular and medical biology, yet experimental function annotations through the Gene Ontology (GO) exist for fewer than 0.5% of all known proteins. Computational methods bridge this sequence-annotation gap typically through homology-based annotation transfer by identifying sequence-similar proteins with known function or through prediction methods using evolutionary information. Here, we propose predicting GO terms through annotation transfer based on proximity of proteins in the SeqVec embedding rather than in sequence space. These embeddings originate from deep learned language models (LMs) for protein sequences (SeqVec) transferring the knowledge gained from predicting the next amino acid in 33 million protein sequences. Replicating the conditions of CAFA3, our method reaches an Fmax of 37 ± 2%, 50 ± 3%, and 57 ± 2% for BPO, MFO, and CCO, respectively. Numerically, this appears close to the top ten CAFA3 methods. When restricting the annotation transfer to proteins with < 20% pairwise sequence identity to the query, performance drops (Fmax BPO 33 ± 2%, MFO 43 ± 3%, CCO 53 ± 2%); this still outperforms naïve sequence-based transfer. Preliminary results from CAFA4 appear to confirm these findings. Overall, this new concept is likely to change the annotation of proteins, in particular for proteins from smaller families or proteins with intrinsically disordered regions.
Collapse
Affiliation(s)
- Maria Littmann
- Department of Informatics, Bioinformatics and Computational Biology, i12, TUM (Technical University of Munich), Boltzmannstr. 3, Garching, 85748, Munich, Germany.
- TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany.
| | - Michael Heinzinger
- Department of Informatics, Bioinformatics and Computational Biology, i12, TUM (Technical University of Munich), Boltzmannstr. 3, Garching, 85748, Munich, Germany
- TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
| | - Christian Dallago
- Department of Informatics, Bioinformatics and Computational Biology, i12, TUM (Technical University of Munich), Boltzmannstr. 3, Garching, 85748, Munich, Germany
- TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
| | - Tobias Olenyi
- Department of Informatics, Bioinformatics and Computational Biology, i12, TUM (Technical University of Munich), Boltzmannstr. 3, Garching, 85748, Munich, Germany
| | - Burkhard Rost
- Department of Informatics, Bioinformatics and Computational Biology, i12, TUM (Technical University of Munich), Boltzmannstr. 3, Garching, 85748, Munich, Germany
- Institute for Advanced Study (TUM-IAS), Lichtenbergstr. 2a, Garching, 85748, Munich, Germany
- School of Life Sciences Weihenstephan (TUM-WZW), TUM (Technical University of Munich), Alte Akademie 8, Freising, Germany
- Department of Biochemistry and Molecular Biophysics, Columbia University, 701 West, 168th Street, New York, NY, 10032, USA
| |
Collapse
|
11
|
Runthala A. Probabilistic divergence of a template-based modelling methodology from the ideal protocol. J Mol Model 2021; 27:25. [PMID: 33411019 DOI: 10.1007/s00894-020-04640-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Accepted: 12/09/2020] [Indexed: 12/27/2022]
Abstract
Protein structural information is essential for the detailed mapping of a functional protein network. For a higher modelling accuracy and quicker implementation, template-based algorithms have been extensively deployed and redefined. The methods only assess the predicted structure against its native state/template and do not estimate the accuracy for each modelling step. A divergence measure is therefore postulated to estimate the modelling accuracy against its theoretical optimal benchmark. By freezing the domain boundaries, the divergence measures are predicted for the most crucial steps of a modelling algorithm. To precisely refine the score using weighting constants, big data analysis could further be deployed.
Collapse
Affiliation(s)
- Ashish Runthala
- Department of Biotechnology, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Guntur, Andhra Pradesh, 522502, India.
| |
Collapse
|
12
|
Krtenic B, Drazic A, Arnesen T, Reuter N. Classification and phylogeny for the annotation of novel eukaryotic GNAT acetyltransferases. PLoS Comput Biol 2020; 16:e1007988. [PMID: 33362253 PMCID: PMC7790372 DOI: 10.1371/journal.pcbi.1007988] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2020] [Revised: 01/07/2021] [Accepted: 10/16/2020] [Indexed: 11/19/2022] Open
Abstract
The enzymes of the GCN5-related N-acetyltransferase (GNAT) superfamily count more than 870 000 members through all kingdoms of life and share the same structural fold. GNAT enzymes transfer an acyl moiety from acyl coenzyme A to a wide range of substrates including aminoglycosides, serotonin, glucosamine-6-phosphate, protein N-termini and lysine residues of histones and other proteins. The GNAT subtype of protein N-terminal acetyltransferases (NATs) alone targets a majority of all eukaryotic proteins stressing the omnipresence of the GNAT enzymes. Despite the highly conserved GNAT fold, sequence similarity is quite low between members of this superfamily even when substrates are similar. Furthermore, this superfamily is phylogenetically not well characterized. Thus functional annotation based on sequence similarity is unreliable and strongly hampered for thousands of GNAT members that remain biochemically uncharacterized. Here we used sequence similarity networks to map the sequence space and propose a new classification for eukaryotic GNAT acetyltransferases. Using the new classification, we built a phylogenetic tree, representing the entire GNAT acetyltransferase superfamily. Our results show that protein NATs have evolved more than once on the GNAT acetylation scaffold. We use our classification to predict the function of uncharacterized sequences and verify by in vitro protein assays that two fungal genes encode NAT enzymes targeting specific protein N-terminal sequences, showing that even slight changes on the GNAT fold can lead to change in substrate specificity. In addition to providing a new map of the relationship between eukaryotic acetyltransferases the classification proposed constitutes a tool to improve functional annotation of GNAT acetyltransferases. Enzymes of the GCN5-related N-acetyltransferase (GNAT) superfamily transfer an acetyl group from one molecule to another. This reaction is called acetylation and is one of the most common reactions inside the cell. The GNAT superfamily counts more than 870 000 members through all kingdoms of life. Despite sharing the same fold the GNAT superfamily is very diverse in terms of amino acid sequence and substrates. The eight N-terminal acetyltransferases (NatA, NatB, etc.. to NatH) are a GNAT subtype which acetylates the free amine group of polypeptide chains. This modification is called N-terminal acetylation and is one of the most abundant protein modifications in eukaryotic cells. This subtype is also characterized by a high sequence diversity even though they share the same substrate. In addition, the phylogeny of the superfamily is not characterized. This hampers functional annotation based on sequence similarity, and discovery of novel NATs. In this work we set out to solve the problem of the classification of eukaryotic GCN5-related acetyltransferases and report the first classification framework of the superfamily. This framework can be used as a tool for annotation of all GCN5-related acetyltransferases. As an example of what can be achieved we report in this paper the computational prediction and in vitro verification of the function of two previously uncharacterized N-terminal acetyltransferases. We also report the first acetyltransferase phylogenetic tree of the GCN5 superfamily. It indicates that N-terminal acetyltransferases do not constitute one homogeneous protein family, but that the ability to bind and acetylate protein N-termini had evolved more than once on the same acetylation scaffold. We also show that even small changes in key positions can lead to altered enzyme specificity.
Collapse
Affiliation(s)
- Bojan Krtenic
- Department of Biological Sciences, University of Bergen, Norway
- Computational Biology Unit, Department of Informatics, University of Bergen, Norway
- * E-mail: (BK); (NR)
| | - Adrian Drazic
- Department of Biomedicine, University of Bergen, Norway
| | - Thomas Arnesen
- Department of Biological Sciences, University of Bergen, Norway
- Department of Biomedicine, University of Bergen, Norway
- Department of Surgery, Haukeland University Hospital, Norway
| | - Nathalie Reuter
- Computational Biology Unit, Department of Informatics, University of Bergen, Norway
- Department of Chemistry, University of Bergen, Norway
- * E-mail: (BK); (NR)
| |
Collapse
|
13
|
Rosen MR, Leuthaeuser JB, Parish CA, Fetrow JS. Isofunctional Clustering and Conformational Analysis of the Arsenate Reductase Superfamily Reveals Nine Distinct Clusters. Biochemistry 2020; 59:4262-4284. [PMID: 33135415 DOI: 10.1021/acs.biochem.0c00651] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Arsenate reductase (ArsC) is a superfamily of enzymes that reduce arsenate. Due to active site similarities, some ArsC can function as low-molecular weight protein tyrosine phosphatases (LMW-PTPs). Broad superfamily classifications align with redox partners (Trx- or Grx-linked). To understand this superfamily's mechanistic diversity, the ArsC superfamily is classified on the basis of active site features utilizing the tools TuLIP (two-level iterative clustering process) and autoMISST (automated multilevel iterative sequence searching technique). This approach identified nine functionally relevant (perhaps isofunctional) protein groups. Five groups exhibit distinct ArsC mechanisms. Three are Grx-linked: group 4AA (classical ArsC), group 3AAA (YffB-like), and group 5BAA. Two are Trx-linked: groups 6AAAAA and 7AAAAAAAA. One is an Spx-like transcriptional regulatory group, group 5AAA. Three are potential LMW-PTP groups: groups 7BAAAA, and 7AAAABAA, which have not been previously identified, and the well-studied LMW-PTP family group 8AAA. Molecular dynamics simulations were utilized to explore functional site details. In several families, we confirm and add detail to literature-based mechanistic information. Mechanistic roles are hypothesized for conserved active site residues in several families. In three families, simulations of the unliganded structure sample specific conformational ensembles, which are proposed to represent either a more ligand-binding-competent conformation or a pathway toward a more binding-competent state; these active sites may be designed to traverse high-energy barriers to the lower-energy conformations necessary to more readily bind ligands. This more detailed biochemical understanding of ArsC and ArsC-like PTP mechanisms opens possibilities for further understanding of arsenate bioremediation and the LMW-PTP mechanism.
Collapse
Affiliation(s)
- Mikaela R Rosen
- Department of Chemistry, Gottwald Center for the Sciences, University of Richmond, Richmond, Virginia 23713, United States
| | - Janelle B Leuthaeuser
- Department of Chemistry, Gottwald Center for the Sciences, University of Richmond, Richmond, Virginia 23713, United States
| | - Carol A Parish
- Department of Chemistry, Gottwald Center for the Sciences, University of Richmond, Richmond, Virginia 23713, United States
| | - Jacquelyn S Fetrow
- Department of Chemistry, Gottwald Center for the Sciences, University of Richmond, Richmond, Virginia 23713, United States
| |
Collapse
|
14
|
de Oliveira Almeida R, Valente GT. Predicting metabolic pathways of plant enzymes without using sequence similarity: Models from machine learning. THE PLANT GENOME 2020; 13:e20043. [PMID: 33217216 DOI: 10.1002/tpg2.20043] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/08/2019] [Revised: 06/03/2020] [Accepted: 06/10/2020] [Indexed: 06/11/2023]
Abstract
Most of the bioinformatics tools for enzyme annotation focus on enzymatic function assignments. Sequence similarity to well-characterized enzymes is often used for functional annotation and to assign metabolic pathways. However, these approaches are not feasible for all sequences leading to inaccurate annotations or lack of metabolic pathway information. Here we present the mApLe (metabolic pathway predictor of plant enzymes), a high-performance machine learning-based tool with models to label the metabolic pathway of enzymes rather than specifying enzymes' reactions. The mApLe uses molecular descriptors of the enzyme sequences to perform predictions without considering sequence similarities with reference sequences. Hence, mApLe can classify a diversity of enzymes, even the ones without any homolog or with incomplete EC numbers. This tool can be used to improve the quality of genomic annotation of plants or to narrow down the number of candidate genes for metabolic engineering researches. The mApLe tool is available online, and the GUI can be locally installed.
Collapse
Affiliation(s)
- Rodrigo de Oliveira Almeida
- Instituto Federal de Educação, Ciência e Tecnologia do Sudeste de Minas Gerais, Muriaé, Brazil
- Department of Bioprocess and Biotechnology, School of Agriculture, São Paulo State University (Unesp), Botucatu, Brazil
| | - Guilherme Targino Valente
- Department of Bioprocess and Biotechnology, School of Agriculture, São Paulo State University (Unesp), Botucatu, Brazil
- Department of Developmental Genetics, Max Planck Institut für Herz- und Lungenforschung, Bad Nauheim, Germany
| |
Collapse
|
15
|
Moore BM, Wang P, Fan P, Lee A, Leong B, Lou YR, Schenck CA, Sugimoto K, Last R, Lehti-Shiu MD, Barry CS, Shiu SH. Within- and cross-species predictions of plant specialized metabolism genes using transfer learning. IN SILICO PLANTS 2020; 2:diaa005. [PMID: 33344884 PMCID: PMC7731531 DOI: 10.1093/insilicoplants/diaa005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/13/2020] [Accepted: 07/21/2020] [Indexed: 06/12/2023]
Abstract
Plant specialized metabolites mediate interactions between plants and the environment and have significant agronomical/pharmaceutical value. Most genes involved in specialized metabolism (SM) are unknown because of the large number of metabolites and the challenge in differentiating SM genes from general metabolism (GM) genes. Plant models like Arabidopsis thaliana have extensive, experimentally derived annotations, whereas many non-model species do not. Here we employed a machine learning strategy, transfer learning, where knowledge from A. thaliana is transferred to predict gene functions in cultivated tomato with fewer experimentally annotated genes. The first tomato SM/GM prediction model using only tomato data performs well (F-measure = 0.74, compared with 0.5 for random and 1.0 for perfect predictions), but from manually curating 88 SM/GM genes, we found many mis-predicted entries were likely mis-annotated. When the SM/GM prediction models built with A. thaliana data were used to filter out genes where the A. thaliana-based model predictions disagreed with tomato annotations, the new tomato model trained with filtered data improved significantly (F-measure = 0.92). Our study demonstrates that SM/GM genes can be better predicted by leveraging cross-species information. Additionally, our findings provide an example for transfer learning in genomics where knowledge can be transferred from an information-rich species to an information-poor one.
Collapse
Affiliation(s)
- Bethany M Moore
- Department of Plant Biology, Michigan State University, East Lansing, MI, USA
- Ecology, Evolutionary Biology, and Behavior Program, Michigan State University, East Lansing, MI, USA
| | - Peipei Wang
- Department of Plant Biology, Michigan State University, East Lansing, MI, USA
| | - Pengxiang Fan
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, USA
| | - Aaron Lee
- Department of Biology, The College of New Jersey, Ewing, NJ, USA
| | - Bryan Leong
- Department of Plant Biology, Michigan State University, East Lansing, MI, USA
| | - Yann-Ru Lou
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, USA
| | - Craig A Schenck
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, USA
| | - Koichi Sugimoto
- MSU-DOE Plant Research Laboratory, Michigan State University, East Lansing, MI, USA
- Science Research Center, Yamaguchi University, Yamaguchi, Japan
| | - Robert Last
- Department of Plant Biology, Michigan State University, East Lansing, MI, USA
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, USA
| | | | - Cornelius S Barry
- Department of Horticulture, Michigan State University, East Lansing, MI, USA
| | - Shin-Han Shiu
- Department of Plant Biology, Michigan State University, East Lansing, MI, USA
- Ecology, Evolutionary Biology, and Behavior Program, Michigan State University, East Lansing, MI, USA
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI
| |
Collapse
|
16
|
Clark TJ, Guo L, Morgan J, Schwender J. Modeling Plant Metabolism: From Network Reconstruction to Mechanistic Models. ANNUAL REVIEW OF PLANT BIOLOGY 2020; 71:303-326. [PMID: 32017600 DOI: 10.1146/annurev-arplant-050718-100221] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Mathematical modeling of plant metabolism enables the plant science community to understand the organization of plant metabolism, obtain quantitative insights into metabolic functions, and derive engineering strategies for manipulation of metabolism. Among the various modeling approaches, metabolic pathway analysis can dissect the basic functional modes of subsections of core metabolism, such as photorespiration, and reveal how classical definitions of metabolic pathways have overlapping functionality. In the many studies using constraint-based modeling in plants, numerous computational tools are currently available to analyze large-scale and genome-scale metabolic networks. For 13C-metabolic flux analysis, principles of isotopic steady state have been used to study heterotrophic plant tissues, while nonstationary isotope labeling approaches are amenable to the study of photoautotrophic and secondary metabolism. Enzyme kinetic models explore pathways in mechanistic detail, and we discuss different approaches to determine or estimate kinetic parameters. In this review, we describe recent advances and challenges in modeling plant metabolism.
Collapse
Affiliation(s)
- Teresa J Clark
- Biology Department, Brookhaven National Laboratory, Upton, New York 11973, USA; ,
| | - Longyun Guo
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana 47907, USA; ,
| | - John Morgan
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana 47907, USA; ,
| | - Jorg Schwender
- Biology Department, Brookhaven National Laboratory, Upton, New York 11973, USA; ,
| |
Collapse
|
17
|
Semwal R, Aier I, Tyagi P, Varadwaj PK. DeEPn: a deep neural network based tool for enzyme functional annotation. J Biomol Struct Dyn 2020; 39:2733-2743. [PMID: 32274968 DOI: 10.1080/07391102.2020.1754292] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
With the advancement of high throughput techniques, the discovery rate of enzyme sequences has increased significantly in the recent past. All of these raw sequences are required to be precisely mapped to their respective functional attributes, which helps in deciphering their biological role. In the recent past, various prediction models have been proposed to predict the enzyme functional class; however, all of these models were able to quantify at most six functional enzyme classes (EC1 to EC6) out of existing seven functional classes, making these approaches inappropriate for handling enzymes corresponding to the seventh functional class (EC7). In this study, a Deep Neural Network-based approach, DeEPn, has been proposed, which can quantify enzymes corresponding to all seven functional classes with high precision and accuracy. The proposed model was compared with two recently developed tools, ECPred and SVM-Prot. The result demonstrated that DeEPn outperformed ECPred and SVM-Prot in terms of predictive quality. The DeEPn tool has been hosted as a web-based tool at https://bioserver.iiita.ac.in/DeEPn/.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Rahul Semwal
- Department of Information Technology (Bioinformatics), Indian Institute of Information Technology Allahabad, Allahabad, Uttar Pradesh, India
| | - Imlimaong Aier
- Department of Bioinformatics and Applied Science, Indian Institute of Information Technology, Allahabad, Allahabad, Uttar Pradesh, India
| | - Pankaj Tyagi
- Department of Information Technology (Bioinformatics), Indian Institute of Information Technology Allahabad, Allahabad, Uttar Pradesh, India
| | - Pritish Kumar Varadwaj
- Department of Bioinformatics and Applied Science, Indian Institute of Information Technology, Allahabad, Allahabad, Uttar Pradesh, India
| |
Collapse
|
18
|
Qiu J, Bernhofer M, Heinzinger M, Kemper S, Norambuena T, Melo F, Rost B. ProNA2020 predicts protein-DNA, protein-RNA, and protein-protein binding proteins and residues from sequence. J Mol Biol 2020; 432:2428-2443. [PMID: 32142788 DOI: 10.1016/j.jmb.2020.02.026] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2019] [Revised: 02/17/2020] [Accepted: 02/23/2020] [Indexed: 11/29/2022]
Abstract
The intricate details of how proteins bind to proteins, DNA, and RNA are crucial for the understanding of almost all biological processes. Disease-causing sequence variants often affect binding residues. Here, we described a new, comprehensive system of in silico methods that take only protein sequence as input to predict binding of protein to DNA, RNA, and other proteins. Firstly, we needed to develop several new methods to predict whether or not proteins bind (per-protein prediction). Secondly, we developed independent methods that predict which residues bind (per-residue). Not requiring three-dimensional information, the system can predict the actual binding residue. The system combined homology-based inference with machine learning and motif-based profile-kernel approaches with word-based (ProtVec) solutions to machine learning protein level predictions. This achieved an overall non-exclusive three-state accuracy of 77% ± 1% (±one standard error) corresponding to a 1.8 fold improvement over random (best classification for protein-protein with F1 = 91 ± 0.8%). Standard neural networks for per-residue binding residue predictions appeared best for DNA-binding (Q2 = 81 ± 0.9%) followed by RNA-binding (Q2 = 80 ± 1%) and worst for protein-protein binding (Q2 = 69 ± 0.8%). The new method, dubbed ProNA2020, is available as code through github (https://github.com/Rostlab/ProNA2020.git) and through PredictProtein (www.predictprotein.org).
Collapse
Affiliation(s)
- Jiajun Qiu
- Department of Informatics, I12-Chair of Bioinformatics and Computational Biology, Technical University of Munich (TUM), Boltzmannstrasse 3, 85748, Garching, Munich, Germany; TUM Graduate School, Center of Doctoral Studies in Informatics and Its Applications (CeDoSIA), Garching, 85748, Germany.
| | - Michael Bernhofer
- Department of Informatics, I12-Chair of Bioinformatics and Computational Biology, Technical University of Munich (TUM), Boltzmannstrasse 3, 85748, Garching, Munich, Germany; TUM Graduate School, Center of Doctoral Studies in Informatics and Its Applications (CeDoSIA), Garching, 85748, Germany
| | - Michael Heinzinger
- Department of Informatics, I12-Chair of Bioinformatics and Computational Biology, Technical University of Munich (TUM), Boltzmannstrasse 3, 85748, Garching, Munich, Germany; TUM Graduate School, Center of Doctoral Studies in Informatics and Its Applications (CeDoSIA), Garching, 85748, Germany
| | - Sofie Kemper
- Department of Informatics, I12-Chair of Bioinformatics and Computational Biology, Technical University of Munich (TUM), Boltzmannstrasse 3, 85748, Garching, Munich, Germany
| | - Tomas Norambuena
- Molecular Bioinformatics Laboratory, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Francisco Melo
- Molecular Bioinformatics Laboratory, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Santiago, Chile; Institute of Biological and Medical Engineering, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Burkhard Rost
- Department of Informatics, I12-Chair of Bioinformatics and Computational Biology, Technical University of Munich (TUM), Boltzmannstrasse 3, 85748, Garching, Munich, Germany; Columbia University, Department of Biochemistry and Molecular Biophysics, 701 West, 168th Street, New York, NY, 10032, USA; Institute of Advanced Study (TUM-IAS), Lichtenbergstr. 2a, 85748, Garching/Munich, Germany; Germany & Institute for Food and Plant Sciences (WZW) Weihenstephan, Alte Akademie 8, 85354 Freising, Germany
| |
Collapse
|
19
|
Siedhoff NE, Schwaneberg U, Davari MD. Machine learning-assisted enzyme engineering. Methods Enzymol 2020; 643:281-315. [DOI: 10.1016/bs.mie.2020.05.005] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
|
20
|
Šimčíková D, Heneberg P. Refinement of evolutionary medicine predictions based on clinical evidence for the manifestations of Mendelian diseases. Sci Rep 2019; 9:18577. [PMID: 31819097 PMCID: PMC6901466 DOI: 10.1038/s41598-019-54976-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2018] [Accepted: 11/21/2019] [Indexed: 12/28/2022] Open
Abstract
Prediction methods have become an integral part of biomedical and biotechnological research. However, their clinical interpretations are largely based on biochemical or molecular data, but not clinical data. Here, we focus on improving the reliability and clinical applicability of prediction algorithms. We assembled and curated two large non-overlapping large databases of clinical phenotypes. These phenotypes were caused by missense variations in 44 and 63 genes associated with Mendelian diseases. We used these databases to establish and validate the model, allowing us to improve the predictions obtained from EVmutation, SNAP2 and PoPMuSiC 2.1. The predictions of clinical effects suffered from a lack of specificity, which appears to be the common constraint of all recently used prediction methods, although predictions mediated by these methods are associated with nearly absolute sensitivity. We introduced evidence-based tailoring of the default settings of the prediction methods; this tailoring substantially improved the prediction outcomes. Additionally, the comparisons of the clinically observed and theoretical variations led to the identification of large previously unreported pools of variations that were under negative selection during molecular evolution. The evolutionary variation analysis approach described here is the first to enable the highly specific identification of likely disease-causing missense variations that have not yet been associated with any clinical phenotype.
Collapse
Affiliation(s)
- Daniela Šimčíková
- Charles University, Third Faculty of Medicine, Prague, Czech Republic
| | - Petr Heneberg
- Charles University, Third Faculty of Medicine, Prague, Czech Republic.
| |
Collapse
|
21
|
Mahlich Y, Steinegger M, Rost B, Bromberg Y. HFSP: high speed homology-driven function annotation of proteins. Bioinformatics 2019; 34:i304-i312. [PMID: 29950013 PMCID: PMC6022561 DOI: 10.1093/bioinformatics/bty262] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Motivation The rapid drop in sequencing costs has produced many more (predicted) protein sequences than can feasibly be functionally annotated with wet-lab experiments. Thus, many computational methods have been developed for this purpose. Most of these methods employ homology-based inference, approximated via sequence alignments, to transfer functional annotations between proteins. The increase in the number of available sequences, however, has drastically increased the search space, thus significantly slowing down alignment methods. Results Here we describe homology-derived functional similarity of proteins (HFSP), a novel computational method that uses results of a high-speed alignment algorithm, MMseqs2, to infer functional similarity of proteins on the basis of their alignment length and sequence identity. We show that our method is accurate (85% precision) and fast (more than 40-fold speed increase over state-of-the-art). HFSP can help correct at least a 16% error in legacy curations, even for a resource of as high quality as Swiss-Prot. These findings suggest HFSP as an ideal resource for large-scale functional annotation efforts. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yannick Mahlich
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ, USA.,Computational Biology & Bioinformatics - i12 Informatics, Technical University of Munich (TUM), Munich, Germany.,Institute for Advanced Study, Technical University of Munich (TUM), Munich, Germany
| | - Martin Steinegger
- Computational Biology & Bioinformatics - i12 Informatics, Technical University of Munich (TUM), Munich, Germany.,Quantitative and Computational Biology Group, Max-Planck Institute for Biophysical Chemistry, Göttingen, Germany.,Department of Chemistry, Seoul National University, Seoul, Korea
| | - Burkhard Rost
- Computational Biology & Bioinformatics - i12 Informatics, Technical University of Munich (TUM), Munich, Germany.,Institute for Advanced Study, Technical University of Munich (TUM), Munich, Germany.,TUM School of Life Sciences Weihenstephan (WZW), Technical University Munich (TUM), Freising, Germany.,Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA.,New York Consortium on Membrane Protein Structure (NYCOMPS), New York, NY, USA
| | - Yana Bromberg
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ, USA.,Institute for Advanced Study, Technical University of Munich (TUM), Munich, Germany.,Department of Genetics, Human Genetics Institute, Rutgers University, Piscataway, NJ, USA
| |
Collapse
|
22
|
Konaté MM, Plata G, Park J, Usmanova DR, Wang H, Vitkup D. Molecular function limits divergent protein evolution on planetary timescales. eLife 2019; 8:e39705. [PMID: 31532392 PMCID: PMC6750897 DOI: 10.7554/elife.39705] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2018] [Accepted: 08/07/2019] [Indexed: 01/25/2023] Open
Abstract
Functional conservation is known to constrain protein evolution. Nevertheless, the long-term divergence patterns of proteins maintaining the same molecular function and the possible limits of this divergence have not been explored in detail. We investigate these fundamental questions by characterizing the divergence between ancient protein orthologs with conserved molecular function. Our results demonstrate that the decline of sequence and structural similarities between such orthologs significantly slows down after ~1-2 billion years of independent evolution. As a result, the sequence and structural similarities between ancient orthologs have not substantially decreased for the past billion years. The effective divergence limit (>25% sequence identity) is not primarily due to protein sites universally conserved in all linages. Instead, less than four amino acid types are accepted, on average, per site across orthologous protein sequences. Our analysis also reveals different divergence patterns for protein sites with experimentally determined small and large fitness effects of mutations. Editorial note This article has been through an editorial process in which the authors decide how to respond to the issues raised during peer review. The Reviewing Editor's assessment is that all the issues have been addressed (see decision letter).
Collapse
Affiliation(s)
- Mariam M Konaté
- Department of Systems BiologyColumbia UniversityNew YorkUnited States
- Division of Cancer Treatment and Diagnosis, National Cancer InstituteBethesdaUnited States
| | - Germán Plata
- Department of Systems BiologyColumbia UniversityNew YorkUnited States
| | - Jimin Park
- Department of Systems BiologyColumbia UniversityNew YorkUnited States
- Department of Pathology and Cell BiologyColumbia UniversityNew YorkUnited States
| | - Dinara R Usmanova
- Department of Systems BiologyColumbia UniversityNew YorkUnited States
| | - Harris Wang
- Department of Systems BiologyColumbia UniversityNew YorkUnited States
- Department of Pathology and Cell BiologyColumbia UniversityNew YorkUnited States
| | - Dennis Vitkup
- Department of Systems BiologyColumbia UniversityNew YorkUnited States
- Department of Biomedical InformaticsColumbia UniversityNew YorkUnited States
| |
Collapse
|
23
|
Zhu C, Mahlich Y, Miller M, Bromberg Y. fusionDB: assessing microbial diversity and environmental preferences via functional similarity networks. Nucleic Acids Res 2019; 46:D535-D541. [PMID: 29112720 PMCID: PMC5753390 DOI: 10.1093/nar/gkx1060] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2017] [Accepted: 10/22/2017] [Indexed: 11/14/2022] Open
Abstract
Microbial functional diversification is driven by environmental factors, i.e. microorganisms inhabiting the same environmental niche tend to be more functionally similar than those from different environments. In some cases, even closely phylogenetically related microbes differ more across environments than across taxa. While microbial similarities are often reported in terms of taxonomic relationships, no existing databases directly link microbial functions to the environment. We previously developed a method for comparing microbial functional similarities on the basis of proteins translated from their sequenced genomes. Here, we describe fusionDB, a novel database that uses our functional data to represent 1374 taxonomically distinct bacteria annotated with available metadata: habitat/niche, preferred temperature, and oxygen use. Each microbe is encoded as a set of functions represented by its proteome and individual microbes are connected via common functions. Users can search fusionDB via combinations of organism names and metadata. Moreover, the web interface allows mapping new microbial genomes to the functional spectrum of reference bacteria, rendering interactive similarity networks that highlight shared functionality. fusionDB provides a fast means of comparing microbes, identifying potential horizontal gene transfer events, and highlighting key environment-specific functionality.
Collapse
Affiliation(s)
- Chengsheng Zhu
- Department of Biochemistry and Microbiology, Rutgers University, 76 Lipman Dr, New Brunswick, NJ 08873, USA
| | - Yannick Mahlich
- Department of Biochemistry and Microbiology, Rutgers University, 76 Lipman Dr, New Brunswick, NJ 08873, USA.,Computational Biology & Bioinformatics - i12 Informatics, Technical University of Munich (TUM), Boltzmannstrasse 3, 85748 Garching/Munich, Germany.,TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Technical University of Munich (TUM), 85748 Garching/Munich, Germany.,Institute for Advanced Study, Technical University of Munich (TUM), Lichtenbergstrasse 2 a, 85748 Garching/Munich, Germany
| | - Maximilian Miller
- Department of Biochemistry and Microbiology, Rutgers University, 76 Lipman Dr, New Brunswick, NJ 08873, USA.,Computational Biology & Bioinformatics - i12 Informatics, Technical University of Munich (TUM), Boltzmannstrasse 3, 85748 Garching/Munich, Germany.,TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Technical University of Munich (TUM), 85748 Garching/Munich, Germany
| | - Yana Bromberg
- Department of Biochemistry and Microbiology, Rutgers University, 76 Lipman Dr, New Brunswick, NJ 08873, USA.,Institute for Advanced Study, Technical University of Munich (TUM), Lichtenbergstrasse 2 a, 85748 Garching/Munich, Germany
| |
Collapse
|
24
|
Zhu C, Miller M, Marpaka S, Vaysberg P, Rühlemann MC, Wu G, Heinsen FA, Tempel M, Zhao L, Lieb W, Franke A, Bromberg Y. Functional sequencing read annotation for high precision microbiome analysis. Nucleic Acids Res 2019; 46:e23. [PMID: 29194524 PMCID: PMC5829635 DOI: 10.1093/nar/gkx1209] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2017] [Accepted: 11/27/2017] [Indexed: 01/16/2023] Open
Abstract
The vast majority of microorganisms on Earth reside in often-inseparable environment-specific communities—microbiomes. Meta-genomic/-transcriptomic sequencing could reveal the otherwise inaccessible functionality of microbiomes. However, existing analytical approaches focus on attributing sequencing reads to known genes/genomes, often failing to make maximal use of available data. We created faser (functional annotation of sequencing reads), an algorithm that is optimized to map reads to molecular functions encoded by the read-correspondent genes. The mi-faser microbiome analysis pipeline, combining faser with our manually curated reference database of protein functions, accurately annotates microbiome molecular functionality. mi-faser’s minutes-per-microbiome processing speed is significantly faster than that of other methods, allowing for large scale comparisons. Microbiome function vectors can be compared between different conditions to highlight environment-specific and/or time-dependent changes in functionality. Here, we identified previously unseen oil degradation-specific functions in BP oil-spill data, as well as functional signatures of individual-specific gut microbiome responses to a dietary intervention in children with Prader–Willi syndrome. Our method also revealed variability in Crohn's Disease patient microbiomes and clearly distinguished them from those of related healthy individuals. Our analysis highlighted the microbiome role in CD pathogenicity, demonstrating enrichment of patient microbiomes in functions that promote inflammation and that help bacteria survive it.
Collapse
Affiliation(s)
- Chengsheng Zhu
- Department of Biochemistry and Microbiology, Rutgers University, 76 Lipman Dr, New Brunswick, NJ 08873, USA
| | - Maximilian Miller
- Department of Biochemistry and Microbiology, Rutgers University, 76 Lipman Dr, New Brunswick, NJ 08873, USA.,Department for Bioinformatics and Computational Biology, Technische Universität München, Boltzmannstr. 3, 85748 Garching/Munich, Germany.,TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Technische Universität München, 85748 Garching/Munich, Germany
| | - Srinayani Marpaka
- Department of Biochemistry and Microbiology, Rutgers University, 76 Lipman Dr, New Brunswick, NJ 08873, USA
| | - Pavel Vaysberg
- Department of Biochemistry and Microbiology, Rutgers University, 76 Lipman Dr, New Brunswick, NJ 08873, USA
| | - Malte C Rühlemann
- Institute of Clinical Molecular Biology, Kiel University, Kiel, Germany
| | - Guojun Wu
- State Key Laboratory of Microbial Metabolism and Ministry of Education Key Laboratory of Systems Biomedicine, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | | | - Marie Tempel
- Institue of Epidemiology, Kiel University, Kiel, Germany
| | - Liping Zhao
- Department of Biochemistry and Microbiology, Rutgers University, 76 Lipman Dr, New Brunswick, NJ 08873, USA.,State Key Laboratory of Microbial Metabolism and Ministry of Education Key Laboratory of Systems Biomedicine, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China.,Canadian Institute for Advanced Research, Toronto, Canada
| | - Wolfgang Lieb
- Institue of Epidemiology, Kiel University, Kiel, Germany
| | - Andre Franke
- Institute of Clinical Molecular Biology, Kiel University, Kiel, Germany
| | - Yana Bromberg
- Department of Biochemistry and Microbiology, Rutgers University, 76 Lipman Dr, New Brunswick, NJ 08873, USA.,Department of Genetics, Rutgers University, Human Genetics Institute, Life Sciences Building, 145 Bevier Road, Piscataway, NJ 08854, USA.,Institute for Advanced Study, Technische Universität München (TUM-IAS), Lichtenbergstrasse 2 a, D-85748 Garching, Germany
| |
Collapse
|
25
|
Yunes JM, Babbitt PC. Effusion: prediction of protein function from sequence similarity networks. Bioinformatics 2019; 35:442-451. [PMID: 30084920 PMCID: PMC6361244 DOI: 10.1093/bioinformatics/bty672] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2018] [Revised: 07/24/2018] [Accepted: 07/30/2018] [Indexed: 12/26/2022] Open
Abstract
Motivation Critical evaluation of methods for protein function prediction shows that data integration improves the performance of methods that predict protein function, but a basic BLAST-based method is still a top contender. We sought to engineer a method that modernizes the classical approach while avoiding pitfalls common to state-of-the-art methods. Results We present a method for predicting protein function, Effusion, which uses a sequence similarity network to add context for homology transfer, a probabilistic model to account for the uncertainty in labels and function propagation, and the structure of the Gene Ontology (GO) to best utilize sparse input labels and make consistent output predictions. Effusion's model makes it practical to integrate rare experimental data and abundant primary sequence and sequence similarity. We demonstrate Effusion's performance using a critical evaluation method and provide an in-depth analysis. We also dissect the design decisions we used to address challenges for predicting protein function. Finally, we propose directions in which the framework of the method can be modified for additional predictive power. Availability and implementation The source code for an implementation of Effusion is freely available at https://github.com/babbittlab/effusion. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jeffrey M Yunes
- UC Berkeley - UCSF Graduate Program in Bioengineering, University of California, San Francisco, CA, USA
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA, USA
| | - Patricia C Babbitt
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA, USA
- Department of Pharmaceutical Chemistry, University of California, San Francisco, CA, USA
- Quantitative Biosciences Institute, University of California, San Francisco, CA, USA
| |
Collapse
|
26
|
Wright ES, Baum DA. Exclusivity offers a sound yet practical species criterion for bacteria despite abundant gene flow. BMC Genomics 2018; 19:724. [PMID: 30285620 PMCID: PMC6171291 DOI: 10.1186/s12864-018-5099-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2018] [Accepted: 09/21/2018] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND The question of whether bacterial species objectively exist has long divided microbiologists. A major source of contention stems from the fact that bacteria regularly engage in horizontal gene transfer (HGT), making it difficult to ascertain relatedness and draw boundaries between taxa. A natural way to define taxa is based on exclusivity of relatedness, which applies when members of a taxon are more closely related to each other than they are to any outsider. It is largely unknown whether exclusive bacterial taxa exist when averaging over the genome or are rare due to rampant hybridization. RESULTS Here, we analyze a collection of 701 genomes representing a wide variety of environmental isolates from the family Streptomycetaceae, whose members are competent at HGT. We find that the presence/absence of auxiliary genes in the pan-genome displays a hierarchical (tree-like) structure that correlates significantly with the genealogy of the core-genome. Moreover, we identified the existence of many exclusive taxa, although individual genes often contradict these taxa. These conclusions were supported by repeating the analysis on 1,586 genomes belonging to the genus Bacillus. However, despite confirming the existence of exclusive groups (taxa), we were unable to identify an objective threshold at which to assign the rank of species. CONCLUSIONS The existence of bacterial taxa is justified by considering average relatedness across the entire genome, as captured by exclusivity, but is rejected if one requires unanimous agreement of all parts of the genome. We propose using exclusivity to delimit taxa and conventional genome similarity thresholds to assign bacterial taxa to the species rank. This approach recognizes species that are phylogenetically meaningful, while also establishing some degree of comparability across species-ranked taxa in different bacterial clades.
Collapse
Affiliation(s)
- Erik S Wright
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, USA.
- Pittsburgh Center for Evolutionary Biology and Medicine, Pittsburgh, USA.
| | - David A Baum
- Department of Botany, University of Wisconsin-Madison, Madison, USA
| |
Collapse
|
27
|
Hönigschmid P, Bykova N, Schneider R, Ivankov D, Frishman D. Evolutionary Interplay between Symbiotic Relationships and Patterns of Signal Peptide Gain and Loss. Genome Biol Evol 2018; 10:928-938. [PMID: 29608732 PMCID: PMC5952966 DOI: 10.1093/gbe/evy049] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/02/2018] [Indexed: 01/18/2023] Open
Abstract
Can orthologous proteins differ in terms of their ability to be secreted? To answer this question, we investigated the distribution of signal peptides within the orthologous groups of Enterobacterales. Parsimony analysis and sequence comparisons revealed a large number of signal peptide gain and loss events, in which signal peptides emerge or disappear in the course of evolution. Signal peptide losses prevail over gains, an effect which is especially pronounced in the transition from the free-living or commensal to the endosymbiotic lifestyle. The disproportionate decline in the number of signal peptide-containing proteins in endosymbionts cannot be explained by the overall reduction of their genomes. Signal peptides can be gained and lost either by acquisition/elimination of the corresponding N-terminal regions or by gradual accumulation of mutations. The evolutionary dynamics of signal peptides in bacterial proteins represents a powerful mechanism of functional diversification.
Collapse
Affiliation(s)
- Peter Hönigschmid
- Department of Bioinformatics, Technische Universität München, Wissenschaftszentrum Weihenstephan, Freising, Germany
| | - Nadya Bykova
- Institute for Information Transmission Problems (Kharkevich Institute), RAS, Moscow, Russia
| | - René Schneider
- Department of Bioinformatics, Technische Universität München, Wissenschaftszentrum Weihenstephan, Freising, Germany
| | - Dmitry Ivankov
- Institute of Science and Technology Austria, Klosterneuburg, Austria
| | - Dmitrij Frishman
- Department of Bioinformatics, Technische Universität München, Wissenschaftszentrum Weihenstephan, Freising, Germany.,Laboratory of Bioinformatics, RASA Research Center, St. Petersburg State Polytechnical University, Russia
| |
Collapse
|
28
|
Shin JH, Eom H, Song WJ, Rho M. Integrative metagenomic and biochemical studies on rifamycin ADP-ribosyltransferases discovered in the sediment microbiome. Sci Rep 2018; 8:12143. [PMID: 30108275 PMCID: PMC6092378 DOI: 10.1038/s41598-018-30547-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2018] [Accepted: 07/30/2018] [Indexed: 11/23/2022] Open
Abstract
Antibiotic resistance is a serious and growing threat to human health. The environmental microbiome is a rich reservoir of resistomes, offering opportunities to discover new antibiotic resistance genes. Here we demonstrate an integrative approach of utilizing gene sequence and protein structural information to characterize unidentified genes that are responsible for the resistance to the action of rifamycin antibiotic rifampin, a first-line antimicrobial agent to treat tuberculosis. Biochemical characterization of four environmental metagenomic proteins indicates that they are adenosine diphosphate (ADP)-ribosyltransferases and effective in the development of resistance to FDA-approved rifamycins. Our analysis suggests that even a single residue with low sequence conservation plays an important role in regulating the degrees of antibiotic resistance. In addition to advancing our understanding of antibiotic resistomes, this work demonstrates the importance of an integrative approach to discover new metagenomic genes and decipher their biochemical functions.
Collapse
Affiliation(s)
- Jae Hong Shin
- Department of Computer Science and Engineering, Hanyang University, Seoul, Korea
| | - Hyunuk Eom
- Department of Chemistry, Seoul National University, Seoul, 08826, Korea
| | - Woon Ju Song
- Department of Chemistry, Seoul National University, Seoul, 08826, Korea.
| | - Mina Rho
- Department of Computer Science and Engineering, Hanyang University, Seoul, Korea.
- Department of Biomedical Informatics, Hanyang University, Seoul, Korea.
| |
Collapse
|
29
|
Delarue M, Koehl P. Combined approaches from physics, statistics, and computer science for ab initio protein structure prediction: ex unitate vires (unity is strength)? F1000Res 2018; 7. [PMID: 30079234 PMCID: PMC6058471 DOI: 10.12688/f1000research.14870.1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/19/2018] [Indexed: 11/20/2022] Open
Abstract
Connecting the dots among the amino acid sequence of a protein, its structure, and its function remains a central theme in molecular biology, as it would have many applications in the treatment of illnesses related to misfolding or protein instability. As a result of high-throughput sequencing methods, biologists currently live in a protein sequence-rich world. However, our knowledge of protein structure based on experimental data remains comparatively limited. As a consequence, protein structure prediction has established itself as a very active field of research to fill in this gap. This field, once thought to be reserved for theoretical biophysicists, is constantly reinventing itself, borrowing ideas informed by an ever-increasing assembly of scientific domains, from biology, chemistry, (statistical) physics, mathematics, computer science, statistics, bioinformatics, and more recently data sciences. We review the recent progress arising from this integration of knowledge, from the development of specific computer architecture to allow for longer timescales in physics-based simulations of protein folding to the recent advances in predicting contacts in proteins based on detection of coevolution using very large data sets of aligned protein sequences.
Collapse
Affiliation(s)
- Marc Delarue
- Unité Dynamique Structurale des Macromolécules, Institut Pasteur, and UMR 3528 du CNRS, Paris, France
| | - Patrice Koehl
- Department of Computer Science, Genome Center, University of California, Davis, Davis, California, USA
| |
Collapse
|
30
|
Hüdig M, Schmitz J, Engqvist MKM, Maurino VG. Biochemical control systems for small molecule damage in plants. PLANT SIGNALING & BEHAVIOR 2018; 13:e1477906. [PMID: 29944438 PMCID: PMC6103286 DOI: 10.1080/15592324.2018.1477906] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/12/2018] [Accepted: 05/11/2018] [Indexed: 05/29/2023]
Abstract
As a system, plant metabolism is far from perfect: small molecules (metabolites, cofactors, coenzymes, and inorganic molecules) are frequently damaged by unwanted enzymatic or spontaneous reactions. Here, we discuss the emerging principles in small molecule damage biology. We propose that plants evolved at least three distinct systems to control small molecule damage: (i) repair, which returns a damaged molecule to its original state; (ii) scavenging, which converts reactive molecules to harmless products; and (iii) steering, in which the possible formation of a damaged molecule is suppressed. We illustrate the concept of small molecule damage control in plants by describing specific examples for each of these three categories. We highlight interesting insights that we expect future research will provide on those systems, and we discuss promising strategies to discover new small molecule damage-control systems in plants.
Collapse
Affiliation(s)
- M. Hüdig
- Plant Molecular Physiology and Biotechnology Group, Institute of Developmental and Molecular Biology of Plants, Heinrich Heine University, and Cluster of Excellence on Plant Sciences (CEPLAS), Düsseldorf, Germany
| | - J. Schmitz
- Plant Molecular Physiology and Biotechnology Group, Institute of Developmental and Molecular Biology of Plants, Heinrich Heine University, and Cluster of Excellence on Plant Sciences (CEPLAS), Düsseldorf, Germany
| | - M. K. M. Engqvist
- Department of Biology and Biological engineering, Division of Systems and Synthetic Biology, Chalmers University of Technology, Gothenburg, Sweden
| | - V. G. Maurino
- Plant Molecular Physiology and Biotechnology Group, Institute of Developmental and Molecular Biology of Plants, Heinrich Heine University, and Cluster of Excellence on Plant Sciences (CEPLAS), Düsseldorf, Germany
| |
Collapse
|
31
|
Mills CL, Garg R, Lee JS, Tian L, Suciu A, Cooperman GD, Beuning PJ, Ondrechen MJ. Functional classification of protein structures by local structure matching in graph representation. Protein Sci 2018; 27:1125-1135. [PMID: 29604149 PMCID: PMC5980557 DOI: 10.1002/pro.3416] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2017] [Revised: 03/21/2018] [Accepted: 03/26/2018] [Indexed: 11/08/2022]
Abstract
As a result of high‐throughput protein structure initiatives, over 14,400 protein structures have been solved by Structural Genomics (SG) centers and participating research groups. While the totality of SG data represents a tremendous contribution to genomics and structural biology, reliable functional information for these proteins is generally lacking. Better functional predictions for SG proteins will add substantial value to the structural information already obtained. Our method described herein, Graph Representation of Active Sites for Prediction of Function (GRASP‐Func), predicts quickly and accurately the biochemical function of proteins by representing residues at the predicted local active site as graphs rather than in Cartesian coordinates. We compare the GRASP‐Func method to our previously reported method, Structurally Aligned Local Sites of Activity (SALSA), using the Ribulose Phosphate Binding Barrel (RPBB), 6‐Hairpin Glycosidase (6‐HG), and Concanavalin A‐like Lectins/Glucanase (CAL/G) superfamilies as test cases. In each of the superfamilies, SALSA and the much faster method GRASP‐Func yield similar correct classification of previously characterized proteins, providing a validated benchmark for the new method. In addition, we analyzed SG proteins using our SALSA and GRASP‐Func methods to predict function. Forty‐one SG proteins in the RPBB superfamily, nine SG proteins in the 6‐HG superfamily, and one SG protein in the CAL/G superfamily were successfully classified into one of the functional families in their respective superfamily by both methods. This improved, faster, validated computational method can yield more reliable predictions of function that can be used for a wide variety of applications by the community.
Collapse
Affiliation(s)
- Caitlyn L Mills
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, Massachusetts
| | - Rohan Garg
- College of Computer and Information Science, Northeastern University, Boston, Massachusetts
| | - Joslynn S Lee
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, Massachusetts
| | - Liang Tian
- Department of Mathematics, Northeastern University, Boston, Massachusetts
| | - Alexandru Suciu
- Department of Mathematics, Northeastern University, Boston, Massachusetts
| | - Gene D Cooperman
- College of Computer and Information Science, Northeastern University, Boston, Massachusetts
| | - Penny J Beuning
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, Massachusetts
| | - Mary Jo Ondrechen
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, Massachusetts
| |
Collapse
|
32
|
Affiliation(s)
- Jacquelyn S. Fetrow
- Office of the President, Albright College, Reading, Pennsylvania, United States of America
- * E-mail:
| | - Patricia C. Babbitt
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, United States of America
| |
Collapse
|
33
|
Bennett BD, Redford KE, Gralnick JA. MgtE Homolog FicI Acts as a Secondary Ferrous Iron Importer in Shewanella oneidensis Strain MR-1. Appl Environ Microbiol 2018; 84:e01245-17. [PMID: 29330185 PMCID: PMC5835737 DOI: 10.1128/aem.01245-17] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2017] [Accepted: 01/05/2018] [Indexed: 01/28/2023] Open
Abstract
The transport of metals into and out of cells is necessary for the maintenance of appropriate intracellular concentrations. Metals are needed for incorporation into metalloproteins but become toxic at higher concentrations. Many metal transport proteins have been discovered in bacteria, including the Mg2+ transporter E (MgtE) family of passive Mg2+/Co2+ cation-selective channels. Low sequence identity exists between members of the MgtE family, indicating that substrate specificity may differ among MgtE transporters. Under anoxic conditions, dissimilatory metal-reducing bacteria, such as Shewanella and Geobacter species, are exposed to high levels of soluble metals, including Fe2+ and Mn2+ Here we characterize SO_3966, which encodes an MgtE homolog in Shewanella oneidensis that we name FicI (ferrous iron and cobalt importer) based on its role in maintaining metal homeostasis. A SO_3966 deletion mutant exhibits enhanced growth over that of the wild type under conditions with high Fe2+ or Co2+ concentrations but exhibits wild-type Mg2+ transport and retention phenotypes. Conversely, deletion of feoB, which encodes an energy-dependent Fe2+ importer, causes a growth defect under conditions of low Fe2+ concentrations but not high Fe2+ concentrations. We propose that FicI represents a secondary, less energy-dependent mechanism for iron uptake by S. oneidensis under high Fe2+ concentrations.IMPORTANCEShewanella oneidensis MR-1 is a target of microbial engineering for potential uses in biotechnology and the bioremediation of heavy-metal-contaminated environments. A full understanding of the ways in which S. oneidensis interacts with metals, including the means by which it transports metal ions, is important for optimal genetic engineering of this and other organisms for biotechnology purposes such as biosorption. The MgtE family of metal importers has been described previously as Mg2+ and Co2+ transporters. This work broadens that designation with the discovery of an MgtE homolog in S. oneidensis that imports Fe2+ but not Mg2+ The research presented here also expands our knowledge of the means by which microorganisms have adapted to take up essential nutrients such as iron under various conditions.
Collapse
Affiliation(s)
- Brittany D Bennett
- BioTechnology Institute and Department of Plant and Microbial Biology, University of Minnesota-Twin Cities, St. Paul, Minnesota, USA
| | - Kaitlyn E Redford
- BioTechnology Institute and Department of Plant and Microbial Biology, University of Minnesota-Twin Cities, St. Paul, Minnesota, USA
| | - Jeffrey A Gralnick
- BioTechnology Institute and Department of Plant and Microbial Biology, University of Minnesota-Twin Cities, St. Paul, Minnesota, USA
| |
Collapse
|
34
|
Zhang C, Zheng W, Freddolino PL, Zhang Y. MetaGO: Predicting Gene Ontology of Non-homologous Proteins Through Low-Resolution Protein Structure Prediction and Protein-Protein Network Mapping. J Mol Biol 2018. [PMID: 29534977 DOI: 10.1016/j.jmb.2018.03.004] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Homology-based transferal remains the major approach to computational protein function annotations, but it becomes increasingly unreliable when the sequence identity between query and template decreases below 30%. We propose a novel pipeline, MetaGO, to deduce Gene Ontology attributes of proteins by combining sequence homology-based annotation with low-resolution structure prediction and comparison, and partner's homology-based protein-protein network mapping. The pipeline was tested on a large-scale set of 1000 non-redundant proteins from the CAFA3 experiment. Under the stringent benchmark conditions where templates with >30% sequence identity to the query are excluded, MetaGO achieves average F-measures of 0.487, 0.408, and 0.598, for Molecular Function, Biological Process, and Cellular Component, respectively, which are significantly higher than those achieved by other state-of-the-art function annotations methods. Detailed data analysis shows that the major advantage of the MetaGO lies in the new functional homolog detections from partner's homology-based network mapping and structure-based local and global structure alignments, the confidence scores of which can be optimally combined through logistic regression. These data demonstrate the power of using a hybrid model incorporating protein structure and interaction networks to deduce new functional insights beyond traditional sequence homology-based referrals, especially for proteins that lack homologous function templates. The MetaGO pipeline is available at http://zhanglab.ccmb.med.umich.edu/MetaGO/.
Collapse
Affiliation(s)
- Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Peter L Freddolino
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA; Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA; Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA.
| |
Collapse
|
35
|
Abstract
Recent advances in high-throughput structure determination and computational protein structure prediction have significantly enriched the universe of protein structure. However, there is still a large gap between the number of available protein structures and that of proteins with annotated function in high accuracy. Computational structure-based protein function prediction has emerged to reduce this knowledge gap. The identification of a ligand binding site and its structure is critical to the determination of a protein's molecular function. We present a computational methodology for predicting small molecule ligand binding site and ligand structure using G-LoSA, our protein local structure alignment and similarity measurement tool. All the computational procedures described here can be easily implemented using G-LoSA Toolkit, a package of standalone software programs and preprocessed PDB structure libraries. G-LoSA and G-LoSA Toolkit are freely available to academic users at http://compbio.lehigh.edu/GLoSA . We also illustrate a case study to show the potential of our template-based approach harnessing G-LoSA for protein function prediction.
Collapse
Affiliation(s)
- Hui Sun Lee
- Department of Biological Sciences and Bioengineering Program, Lehigh University, Bethlehem, PA, 18015, USA.
| | - Wonpil Im
- Department of Biological Sciences and Bioengineering Program, Lehigh University, Bethlehem, PA, 18015, USA.
| |
Collapse
|
36
|
Kikuchi A, Okuyama M, Kato K, Osaki S, Ma M, Kumagai Y, Matsunaga K, Klahan P, Tagami T, Yao M, Kimura A. A novel glycoside hydrolase family 97 enzyme: Bifunctional β- l -arabinopyranosidase/α-galactosidase from Bacteroides thetaiotaomicron. Biochimie 2017; 142:41-50. [DOI: 10.1016/j.biochi.2017.08.003] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2017] [Accepted: 08/07/2017] [Indexed: 10/19/2022]
|
37
|
Das S, Bhadra P, Ramakumar S, Pal D. Molecular Dynamics Information Improves cis-Peptide-Based Function Annotation of Proteins. J Proteome Res 2017. [PMID: 28633522 DOI: 10.1021/acs.jproteome.7b00217] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
cis-Peptide bonds, whose occurrence in proteins is rare but evolutionarily conserved, are implicated to play an important role in protein function. This has led to their previous use in a homology-independent, fragment-match-based protein function annotation method. However, proteins are not static molecules; dynamics is integral to their activity. This is nicely epitomized by the geometric isomerization of cis-peptide to trans form for molecular activity. Hence we have incorporated both static (cis-peptide) and dynamics information to improve the prediction of protein molecular function. Our results show that cis-peptide information alone cannot detect functional matches in cases where cis-trans isomerization exists but 3D coordinates have been obtained for only the trans isomer or when the cis-peptide bond is incorrectly assigned as trans. On the contrary, use of dynamics information alone includes false-positive matches for cases where fragments with similar secondary structure show similar dynamics, but the proteins do not share a common function. Combining the two methods reduces errors while detecting the true matches, thereby enhancing the utility of our method in function annotation. A combined approach, therefore, opens up new avenues of improving existing automated function annotation methodologies.
Collapse
Affiliation(s)
- Sreetama Das
- Department of Physics and ‡Department of Computational and Data Sciences, Indian Institute of Science , Bangalore 560012, India
| | - Pratiti Bhadra
- Department of Physics and ‡Department of Computational and Data Sciences, Indian Institute of Science , Bangalore 560012, India
| | - Suryanarayanarao Ramakumar
- Department of Physics and ‡Department of Computational and Data Sciences, Indian Institute of Science , Bangalore 560012, India
| | - Debnath Pal
- Department of Physics and ‡Department of Computational and Data Sciences, Indian Institute of Science , Bangalore 560012, India
| |
Collapse
|
38
|
Ruiz-Blanco YB, Agüero-Chapin G, García-Hernández E, Álvarez O, Antunes A, Green J. Exploring general-purpose protein features for distinguishing enzymes and non-enzymes within the twilight zone. BMC Bioinformatics 2017; 18:349. [PMID: 28732462 PMCID: PMC5521120 DOI: 10.1186/s12859-017-1758-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2017] [Accepted: 07/13/2017] [Indexed: 11/10/2022] Open
Affiliation(s)
- Yasser B Ruiz-Blanco
- Facultad de Química y Farmacia, Universidad Central "Marta Abreu" de Las Villas, 54830, Santa Clara, Cuba.,Theoretical Chemistry, Max Planck Institute für Kohlenforschung, 45470, Mulheim an der Ruhr, Germany
| | - Guillermin Agüero-Chapin
- CIMAR/CIIMAR, Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos, s/n, 4450-208, Porto, Portugal. .,Centro de Bioactivos Químicos (CBQ), Universidad Central ¨Marta Abreu¨ de Las Villas (UCLV), 54830, Santa Clara, Cuba. .,Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 4169-007, Porto, Portugal.
| | - Enrique García-Hernández
- Instituto de Química, Universidad Nacional Autónoma de México (UNAM), 04360, D.F, México, Mexico
| | - Orlando Álvarez
- Centro de Bioactivos Químicos (CBQ), Universidad Central ¨Marta Abreu¨ de Las Villas (UCLV), 54830, Santa Clara, Cuba
| | - Agostinho Antunes
- CIMAR/CIIMAR, Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos, s/n, 4450-208, Porto, Portugal.,Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 4169-007, Porto, Portugal
| | - James Green
- Department of Systems and Computer Engineering, Carleton University, Ottawa, Canada
| |
Collapse
|
39
|
Burns JA, Zhang H, Hill E, Kim E, Kerney R. Transcriptome analysis illuminates the nature of the intracellular interaction in a vertebrate-algal symbiosis. eLife 2017; 6:e22054. [PMID: 28462779 PMCID: PMC5413350 DOI: 10.7554/elife.22054] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2016] [Accepted: 03/15/2017] [Indexed: 12/19/2022] Open
Abstract
During embryonic development, cells of the green alga Oophila amblystomatis enter cells of the salamander Ambystoma maculatum forming an endosymbiosis. Here, using de novo dual-RNA seq, we compared the host salamander cells that harbored intracellular algae to those without algae and the algae inside the animal cells to those in the egg capsule. This two-by-two-way analysis revealed that intracellular algae exhibit hallmarks of cellular stress and undergo a striking metabolic shift from oxidative metabolism to fermentation. Culturing experiments with the alga showed that host glutamine may be utilized by the algal endosymbiont as a primary nitrogen source. Transcriptional changes in salamander cells suggest an innate immune response to the alga, with potential attenuation of NF-κB, and metabolic alterations indicative of modulation of insulin sensitivity. In stark contrast to its algal endosymbiont, the salamander cells did not exhibit major stress responses, suggesting that the host cell experience is neutral or beneficial.
Collapse
Affiliation(s)
- John A Burns
- Division of Invertebrate Zoology, American Museum of Natural History, New York, United States
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, United States
| | - Huanjia Zhang
- Department of Biology, Gettysburg College, Gettysburg, United States
| | - Elizabeth Hill
- Department of Biology, Gettysburg College, Gettysburg, United States
| | - Eunsoo Kim
- Division of Invertebrate Zoology, American Museum of Natural History, New York, United States
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, United States
| | - Ryan Kerney
- Department of Biology, Gettysburg College, Gettysburg, United States
| |
Collapse
|
40
|
Veprinskiy V, Heizinger L, Plach MG, Merkl R. Assessing in silico the recruitment and functional spectrum of bacterial enzymes from secondary metabolism. BMC Evol Biol 2017; 17:36. [PMID: 28125959 PMCID: PMC5270213 DOI: 10.1186/s12862-017-0886-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2016] [Accepted: 01/16/2017] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND Microbes, plants, and fungi synthesize an enormous number of metabolites exhibiting rich chemical diversity. For a high-level classification, metabolism is subdivided into primary (PM) and secondary (SM) metabolism. SM products are often not essential for survival of the organism and it is generally assumed that SM enzymes stem from PM homologs. RESULTS We wanted to assess evolutionary relationships and function of bona fide bacterial PM and SM enzymes. Thus, we analyzed the content of 1010 biosynthetic gene clusters (BGCs) from the MIBiG dataset; the encoded bacterial enzymes served as representatives of SM. The content of 15 bacterial genomes known not to harbor BGCs served as a representation of PM. Enzymes were categorized on their EC number and for these enzyme functions, frequencies were determined. The comparison of PM/SM frequencies indicates a certain preference for hydrolases (EC class 3) and ligases (EC class 6) in PM and of oxidoreductases (EC class 1) and lyases (EC class 4) in SM. Based on BLAST searches, we determined pairs of PM/SM homologs and their functional diversity. Oxidoreductases, transferases (EC class 2), lyases and isomerases (EC class 5) form a tightly interlinked network indicating that many protein folds can accommodate different functions in PM and SM. In contrast, the functional diversity of hydrolases and especially ligases is significantly limited in PM and SM. For the most direct comparison of PM/SM homologs, we restricted for each BGC the search to the content of the genome it comes from. For each homologous hit, the contribution of the genomic neighborhood to metabolic pathways was summarized in BGC-specific html-pages that are interlinked with KEGG; this dataset can be downloaded from https://www.bioinf.ur.de . CONCLUSIONS Only few reaction chemistries are overrepresented in bacterial SM and at least 55% of the enzymatic functions present in BGCs possess PM homologs. Many SM enzymes arose in PM and Nature utilized the evolvability of enzymes similarly to establish novel functions both in PM and SM. Future work aimed at the elucidation of evolutionary routes that have interconverted a PM enzyme into an SM homolog can profit from our BGC-specific annotations.
Collapse
Affiliation(s)
- Valery Veprinskiy
- Faculty of Mathematics and Computer Science, University of Hagen, D-58084, Hagen, Germany
| | - Leonhard Heizinger
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, D-93040, Regensburg, Germany
| | - Maximilian G Plach
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, D-93040, Regensburg, Germany
| | - Rainer Merkl
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, D-93040, Regensburg, Germany.
| |
Collapse
|
41
|
Weißenborn S, Walther D. Metabolic Pathway Assignment of Plant Genes based on Phylogenetic Profiling-A Feasibility Study. FRONTIERS IN PLANT SCIENCE 2017; 8:1831. [PMID: 29163570 PMCID: PMC5664361 DOI: 10.3389/fpls.2017.01831] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/20/2017] [Accepted: 10/10/2017] [Indexed: 05/19/2023]
Abstract
Despite many developed experimental and computational approaches, functional gene annotation remains challenging. With the rapidly growing number of sequenced genomes, the concept of phylogenetic profiling, which predicts functional links between genes that share a common co-occurrence pattern across different genomes, has gained renewed attention as it promises to annotate gene functions based on presence/absence calls alone. We applied phylogenetic profiling to the problem of metabolic pathway assignments of plant genes with a particular focus on secondary metabolism pathways. We determined phylogenetic profiles for 40,960 metabolic pathway enzyme genes with assigned EC numbers from 24 plant species based on sequence and pathway annotation data from KEGG and Ensembl Plants. For gene sequence family assignments, needed to determine the presence or absence of particular gene functions in the given plant species, we included data of all 39 species available at the Ensembl Plants database and established gene families based on pairwise sequence identities and annotation information. Aside from performing profiling comparisons, we used machine learning approaches to predict pathway associations from phylogenetic profiles alone. Selected metabolic pathways were indeed found to be composed of gene families of greater than expected phylogenetic profile similarity. This was particularly evident for primary metabolism pathways, whereas for secondary pathways, both the available annotation in different species as well as the abstraction of functional association via distinct pathways proved limiting. While phylogenetic profile similarity was generally not found to correlate with gene co-expression, direct physical interactions of proteins were reflected by a significantly increased profile similarity suggesting an application of phylogenetic profiling methods as a filtering step in the identification of protein-protein interactions. This feasibility study highlights the potential and challenges associated with phylogenetic profiling methods for the detection of functional relationships between genes as well as the need to enlarge the set of plant genes with proven secondary metabolism involvement as well as the limitations of distinct pathways as abstractions of relationships between genes.
Collapse
|
42
|
Abstract
Surveys of public sequence resources show that experimentally supported functional information is still completely missing for a considerable fraction of known proteins and is clearly incomplete for an even larger portion. Bioinformatics methods have long made use of very diverse data sources alone or in combination to predict protein function, with the understanding that different data types help elucidate complementary biological roles. This chapter focuses on methods accepting amino acid sequences as input and producing GO term assignments directly as outputs; the relevant biological and computational concepts are presented along with the advantages and limitations of individual approaches.
Collapse
Affiliation(s)
- Domenico Cozzetto
- Bioinformatics Group, Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, UK
| | - David T Jones
- Bioinformatics Group, Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, UK.
| |
Collapse
|
43
|
De-novo protein function prediction using DNA binding and RNA binding proteins as a test case. Nat Commun 2016; 7:13424. [PMID: 27869118 PMCID: PMC5121330 DOI: 10.1038/ncomms13424] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2016] [Accepted: 10/03/2016] [Indexed: 12/14/2022] Open
Abstract
Of the currently identified protein sequences, 99.6% have never been observed in the laboratory as proteins and their molecular function has not been established experimentally. Predicting the function of such proteins relies mostly on annotated homologs. However, this has resulted in some erroneous annotations, and many proteins have no annotated homologs. Here we propose a de-novo function prediction approach based on identifying biophysical features that underlie function. Using our approach, we discover DNA and RNA binding proteins that cannot be identified based on homology and validate these predictions experimentally. For example, FGF14, which belongs to a family of secreted growth factors was predicted to bind DNA. We verify this experimentally and also show that FGF14 is localized to the nucleus. Mutating the predicted binding site on FGF14 abrogated DNA binding. These results demonstrate the feasibility of automated de-novo function prediction based on identifying function-related biophysical features. Identification of the function of proteins is difficult when there are no structurally or biochemically characterized homologs. Here, the authors present an approach that allows the prediction of nucleic-acid binding proteins based on sequence alone, and they are able to experimentally validate their method.
Collapse
|
44
|
Harel A, Häggblom MM, Falkowski PG, Yee N. Evolution of prokaryotic respiratory molybdoenzymes and the frequency of their genomic co-occurrence. FEMS Microbiol Ecol 2016; 92:fiw187. [DOI: 10.1093/femsec/fiw187] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/05/2016] [Indexed: 02/03/2023] Open
|
45
|
Making sense of genomes of parasitic worms: Tackling bioinformatic challenges. Biotechnol Adv 2016; 34:663-686. [DOI: 10.1016/j.biotechadv.2016.03.001] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2015] [Revised: 02/25/2016] [Accepted: 03/01/2016] [Indexed: 01/25/2023]
|
46
|
Rost B, Radivojac P, Bromberg Y. Protein function in precision medicine: deep understanding with machine learning. FEBS Lett 2016; 590:2327-41. [PMID: 27423136 PMCID: PMC5937700 DOI: 10.1002/1873-3468.12307] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2016] [Revised: 07/12/2016] [Accepted: 07/12/2016] [Indexed: 12/21/2022]
Abstract
Precision medicine and personalized health efforts propose leveraging complex molecular, medical and family history, along with other types of personal data toward better life. We argue that this ambitious objective will require advanced and specialized machine learning solutions. Simply skimming some low-hanging results off the data wealth might have limited potential. Instead, we need to better understand all parts of the system to define medically relevant causes and effects: how do particular sequence variants affect particular proteins and pathways? How do these effects, in turn, cause the health or disease-related phenotype? Toward this end, deeper understanding will not simply diffuse from deeper machine learning, but from more explicit focus on understanding protein function, context-specific protein interaction networks, and impact of variation on both.
Collapse
Affiliation(s)
- Burkhard Rost
- Department of Informatics and Bioinformatics, Institute for Advanced Studies, Technical University of Munich, Garching, Germany
| | - Predrag Radivojac
- School of Informatics and Computing, Indiana University, Bloomington, IN, USA
| | - Yana Bromberg
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ, USA
| |
Collapse
|
47
|
Morya VK, Yadav VK, Yadav S, Yadav D. Active Site Characterization of Proteases Sequences from Different Species of Aspergillus. Cell Biochem Biophys 2016; 74:327-35. [PMID: 27358183 DOI: 10.1007/s12013-016-0750-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2013] [Accepted: 06/10/2016] [Indexed: 11/30/2022]
Abstract
A total of 129 proteases sequences comprising 43 serine proteases, 36 aspartic proteases, 24 cysteine protease, 21 metalloproteases, and 05 neutral proteases from different Aspergillus species were analyzed for the catalytically active site residues using MEROPS database and various bioinformatics tools. Different proteases have predominance of variable active site residues. In case of 24 cysteine proteases of Aspergilli, the predominant active site residues observed were Gln193, Cys199, His364, Asn384 while for 43 serine proteases, the active site residues namely Asp164, His193, Asn284, Ser349 and Asp325, His357, Asn454, Ser519 were frequently observed. The analysis of 21 metalloproteases of Aspergilli revealed Glu298 and Glu388, Tyr476 as predominant active site residues. In general, Aspergilli species-specific active site residues were observed for different types of protease sequences analyzed. The phylogenetic analysis of these 129 proteases sequences revealed 14 different clans representing different types of proteases with diverse active site residues.
Collapse
Affiliation(s)
- V K Morya
- Department of Biological Engineering, Inha University, Incheon, 42-751, Republic of Korea
| | - Virendra K Yadav
- Department of Molecular and Cellular Engineering, SHIATS, Allahabad, India
| | - Sangeeta Yadav
- Department of Biotechnology, D.D.U. Gorakhpur University, Gorakhpur, 273009, India
| | - Dinesh Yadav
- Department of Biotechnology, D.D.U. Gorakhpur University, Gorakhpur, 273009, India.
| |
Collapse
|
48
|
Xu Y, Ma Y, Yao S, Jiang Z, Pei J, Cheng C. Characterization, Genome Sequence, and Analysis of Escherichia Phage CICC 80001, a Bacteriophage Infecting an Efficient L-Aspartic Acid Producing Escherichia coli. FOOD AND ENVIRONMENTAL VIROLOGY 2016; 8:18-26. [PMID: 26501200 DOI: 10.1007/s12560-015-9218-0] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/03/2015] [Accepted: 10/22/2015] [Indexed: 06/05/2023]
Abstract
Escherichia phage CICC 80001 was isolated from the bacteriophage contaminated medium of an Escherichia coli strain HY-05C (CICC 11022S) which could produce L-aspartic acid. The phage had a head diameter of 45-50 nm and a tail of about 10 nm. The one-step growth curve showed a latent period of 10 min and a rise period of about 20 min. The average burst size was about 198 phage particles per infected cell. Tests were conducted on the plaques, multiplicity of infection, and host range. The genome of CICC 80001 was sequenced with a length of 38,810 bp, and annotated. The key proteins leading to host-cell lysis were phylogenetically analyzed. One protein belonged to class II holin, and the other two belonged to the endopeptidase family and N-acetylmuramoyl-L-alanine amidase family, respectively. The genome showed the sequence identity of 82.7% with that of Enterobacteria phage T7, and carried ten unique open reading frames. The bacteriophage resistant E. coli strain designated CICC 11021S was breeding and its L-aspartase activity was 84.4% of that of CICC 11022S.
Collapse
Affiliation(s)
- Youqiang Xu
- China Center of Industrial Culture Collection, China National Research Institute of Food and Fermentation Industries, Beijing, 100015, People's Republic of China.
- Engineering Technology Research Center of Fumaric Acid Biotransformation in Shandong Province, Yantai, Shandong Province, 265709, People's Republic of China.
| | - Yuyue Ma
- Engineering Technology Research Center of Fumaric Acid Biotransformation in Shandong Province, Yantai, Shandong Province, 265709, People's Republic of China
| | - Su Yao
- China Center of Industrial Culture Collection, China National Research Institute of Food and Fermentation Industries, Beijing, 100015, People's Republic of China
| | - Zengyan Jiang
- Engineering Technology Research Center of Fumaric Acid Biotransformation in Shandong Province, Yantai, Shandong Province, 265709, People's Republic of China
| | - Jiangsen Pei
- China Center of Industrial Culture Collection, China National Research Institute of Food and Fermentation Industries, Beijing, 100015, People's Republic of China
- Engineering Technology Research Center of Fumaric Acid Biotransformation in Shandong Province, Yantai, Shandong Province, 265709, People's Republic of China
| | - Chi Cheng
- China Center of Industrial Culture Collection, China National Research Institute of Food and Fermentation Industries, Beijing, 100015, People's Republic of China.
| |
Collapse
|
49
|
Žváček C, Friedrichs G, Heizinger L, Merkl R. An assessment of catalytic residue 3D ensembles for the prediction of enzyme function. BMC Bioinformatics 2015; 16:359. [PMID: 26538500 PMCID: PMC4634577 DOI: 10.1186/s12859-015-0807-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2015] [Accepted: 10/29/2015] [Indexed: 12/03/2022] Open
Abstract
Background The central element of each enzyme is the catalytic site, which commonly catalyzes a single biochemical reaction with high specificity. It was unclear to us how often sites that catalyze the same or highly similar reactions evolved on different, i. e. non-homologous protein folds and how similar their 3D poses are. Both similarities are key criteria for assessing the usability of pose comparison for function prediction. Results We have analyzed the SCOP database on the superfamily level in order to estimate the number of non-homologous enzymes possessing the same function according to their EC number. 89 % of the 873 substrate-specific functions (four digit EC number) assigned to mono-functional, single-domain enzymes were only found in one superfamily. For a reaction-specific grouping (three digit EC number), this value dropped to 35 %, indicating that in approximately 65 % of all enzymes the same function evolved in two or more non-homologous proteins. For these isofunctional enzymes, structural similarity of the catalytic sites may help to predict function, because neither high sequence similarity nor identical folds are required for a comparison. To assess the specificity of catalytic 3D poses, we compiled the redundancy-free set ENZ_SITES, which comprises 695 sites, whose composition and function are well-defined. We compared their poses with the help of the program Superpose3D and determined classification performance. If the sites were from different superfamilies, the number of true and false positive predictions was similarly high, both for a coarse and a detailed grouping of enzyme function. Moreover, classification performance did not improve drastically, if we additionally used homologous sites to predict function. Conclusions For a large number of enzymatic functions, dissimilar sites evolved that catalyze the same reaction and it is the individual substrate that determines the arrangement of the catalytic site and its local environment. These substrate-specific requirements turn the comparison of catalytic residues into a weak classifier for the prediction of enzyme function. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0807-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Clemens Žváček
- Faculty of Mathematics and Computer Science, University of Hagen, D-58084, Hagen, Germany.
| | - Gerald Friedrichs
- Faculty of Mathematics and Computer Science, University of Hagen, D-58084, Hagen, Germany.
| | - Leonhard Heizinger
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, D-93040, Regensburg, Germany.
| | - Rainer Merkl
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, D-93040, Regensburg, Germany.
| |
Collapse
|
50
|
Bennett BD, Brutinel ED, Gralnick JA. A Ferrous Iron Exporter Mediates Iron Resistance in Shewanella oneidensis MR-1. Appl Environ Microbiol 2015; 81:7938-44. [PMID: 26341213 PMCID: PMC4616933 DOI: 10.1128/aem.02835-15] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2015] [Accepted: 09/02/2015] [Indexed: 11/20/2022] Open
Abstract
Shewanella oneidensis strain MR-1 is a dissimilatory metal-reducing bacterium frequently found in aquatic sediments. In the absence of oxygen, S. oneidensis can respire extracellular, insoluble oxidized metals, such as iron (hydr)oxides, making it intimately involved in environmental metal and nutrient cycling. The reduction of ferric iron (Fe(3+)) results in the production of ferrous iron (Fe(2+)) ions, which remain soluble under certain conditions and are toxic to cells at higher concentrations. We have identified an inner membrane protein in S. oneidensis, encoded by the gene SO_4475 and here called FeoE, which is important for survival during anaerobic iron respiration. FeoE, a member of the cation diffusion facilitator (CDF) protein family, functions to export excess Fe(2+) from the MR-1 cytoplasm. Mutants lacking feoE exhibit an increased sensitivity to Fe(2+). The export function of FeoE is specific for Fe(2+), as an feoE mutant is equally sensitive to other metal ions known to be substrates of other CDF proteins (Cd(2+), Co(2+), Cu(2+), Mn(2+), Ni(2+), or Zn(2+)). The substrate specificity of FeoE differs from that of FieF, the Escherichia coli homolog of FeoE, which has been reported to be a Cd(2+)/Zn(2+) or Fe(2+)/Zn(2+) exporter. A complemented feoE mutant has an increased growth rate in the presence of excess Fe(2+) compared to that of the ΔfeoE mutant complemented with fieF. It is possible that FeoE has evolved to become an efficient and specific Fe(2+) exporter in response to the high levels of iron often present in the types of environmental niches in which Shewanella species can be found.
Collapse
Affiliation(s)
- Brittany D Bennett
- BioTechnology Institute and Department of Microbiology, University of Minnesota-Twin Cities, St. Paul, Minnesota, USA
| | - Evan D Brutinel
- BioTechnology Institute and Department of Microbiology, University of Minnesota-Twin Cities, St. Paul, Minnesota, USA
| | - Jeffrey A Gralnick
- BioTechnology Institute and Department of Microbiology, University of Minnesota-Twin Cities, St. Paul, Minnesota, USA
| |
Collapse
|