1
|
Moyer DC, Reimertz J, Segrè D, Fuxman Bass JI. Semi-Automatic Detection of Errors in Genome-Scale Metabolic Models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.24.600481. [PMID: 38979177 PMCID: PMC11230171 DOI: 10.1101/2024.06.24.600481] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Background Genome-Scale Metabolic Models (GSMMs) are used for numerous tasks requiring computational estimates of metabolic fluxes, from predicting novel drug targets to engineering microbes to produce valuable compounds. A key limiting step in most applications of GSMMs is ensuring their representation of the target organism's metabolism is complete and accurate. Identifying and visualizing errors in GSMMs is complicated by the fact that they contain thousands of densely interconnected reactions. Furthermore, many errors in GSMMs only become apparent when considering pathways of connected reactions collectively, as opposed to examining reactions individually. Results We present Metabolic Accuracy Check and Analysis Workflow (MACAW), a collection of algorithms for detecting errors in GSMMs. The relative frequencies of errors we detect in manually curated GSMMs appear to reflect the different approaches used to curate them. Changing the method used to automatically create a GSMM from a particular organism's genome can have a larger impact on the kinds of errors in the resulting GSMM than using the same method with a different organism's genome. Our algorithms are particularly capable of identifying errors that are only apparent at the pathway level, including loops, and nontrivial cases of dead ends. Conclusions MACAW is capable of identifying inaccuracies of varying severity in a wide range of GSMMs. Correcting these errors can measurably improve the predictive capacity of a GSMM. The relative prevalence of each type of error we identify in a large collection of GSMMs could help shape future efforts for further automation of error correction and GSMM creation.
Collapse
|
2
|
Kwon JJ, Pan J, Gonzalez G, Hahn WC, Zitnik M. On knowing a gene: A distributional hypothesis of gene function. Cell Syst 2024; 15:488-496. [PMID: 38810640 PMCID: PMC11189734 DOI: 10.1016/j.cels.2024.04.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 02/25/2024] [Accepted: 04/30/2024] [Indexed: 05/31/2024]
Abstract
As words can have multiple meanings that depend on sentence context, genes can have various functions that depend on the surrounding biological system. This pleiotropic nature of gene function is limited by ontologies, which annotate gene functions without considering biological contexts. We contend that the gene function problem in genetics may be informed by recent technological leaps in natural language processing, in which representations of word semantics can be automatically learned from diverse language contexts. In contrast to efforts to model semantics as "is-a" relationships in the 1990s, modern distributional semantics represents words as vectors in a learned semantic space and fuels current advances in transformer-based models such as large language models and generative pre-trained transformers. A similar shift in thinking of gene functions as distributions over cellular contexts may enable a similar breakthrough in data-driven learning from large biological datasets to inform gene function.
Collapse
Affiliation(s)
- Jason J Kwon
- Dana-Farber Cancer Institute and Harvard Medical School, Department of Medical Oncology, Boston, MA 02215, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Joshua Pan
- Dana-Farber Cancer Institute and Harvard Medical School, Department of Medical Oncology, Boston, MA 02215, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Guadalupe Gonzalez
- Department of Computing, Faculty of Engineering, Imperial College, London SW7 2AZ, UK
| | - William C Hahn
- Dana-Farber Cancer Institute and Harvard Medical School, Department of Medical Oncology, Boston, MA 02215, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
| | - Marinka Zitnik
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Harvard Medical School, Department of Biomedical Informatics, Boston, MA 02115, USA; Harvard Data Science Initiative, Harvard University, Cambridge, MA 02138, USA; Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, Allston, MA 02134, USA.
| |
Collapse
|
3
|
Wu S, Guo JT. Improved prediction of DNA and RNA binding proteins with deep learning models. Brief Bioinform 2024; 25:bbae285. [PMID: 38856168 PMCID: PMC11163377 DOI: 10.1093/bib/bbae285] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Revised: 05/20/2024] [Accepted: 05/31/2024] [Indexed: 06/11/2024] Open
Abstract
Nucleic acid-binding proteins (NABPs), including DNA-binding proteins (DBPs) and RNA-binding proteins (RBPs), play important roles in essential biological processes. To facilitate functional annotation and accurate prediction of different types of NABPs, many machine learning-based computational approaches have been developed. However, the datasets used for training and testing as well as the prediction scopes in these studies have limited their applications. In this paper, we developed new strategies to overcome these limitations by generating more accurate and robust datasets and developing deep learning-based methods including both hierarchical and multi-class approaches to predict the types of NABPs for any given protein. The deep learning models employ two layers of convolutional neural network and one layer of long short-term memory. Our approaches outperform existing DBP and RBP predictors with a balanced prediction between DBPs and RBPs, and are more practically useful in identifying novel NABPs. The multi-class approach greatly improves the prediction accuracy of DBPs and RBPs, especially for the DBPs with ~12% improvement. Moreover, we explored the prediction accuracy of single-stranded DNA binding proteins and their effect on the overall prediction accuracy of NABP predictions.
Collapse
Affiliation(s)
- Siwen Wu
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223, United States
| | - Jun-tao Guo
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223, United States
| |
Collapse
|
4
|
Zarin S, Shariq M, Rastogi N, Ahuja Y, Manjunath P, Alam A, Hasnain SE, Ehtesham NZ. Rv2231c, a unique histidinol phosphate aminotransferase from Mycobacterium tuberculosis, supports virulence by inhibiting host-directed defense. Cell Mol Life Sci 2024; 81:203. [PMID: 38698289 PMCID: PMC11065945 DOI: 10.1007/s00018-024-05200-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Revised: 02/02/2024] [Accepted: 03/04/2024] [Indexed: 05/05/2024]
Abstract
Nitrogen metabolism of M. tuberculosis is critical for its survival in infected host cells. M. tuberculosis has evolved sophisticated strategies to switch between de novo synthesis and uptake of various amino acids from host cells for metabolic demands. Pyridoxal phosphate-dependent histidinol phosphate aminotransferase-HspAT enzyme is critically required for histidine biosynthesis. HspAT is involved in metabolic synthesis of histidine, phenylalanine, tyrosine, tryptophan, and novobiocin. We showed that M. tuberculosis Rv2231c is a conserved enzyme with HspAT activity. Rv2231c is a monomeric globular protein that contains α-helices and β-sheets. It is a secretory and cell wall-localized protein that regulates critical pathogenic attributes. Rv2231c enhances the survival and virulence of recombinant M. smegmatis in infected RAW264.7 macrophage cells. Rv2231c is recognized by the TLR4 innate immune receptor and modulates the host immune response by suppressing the secretion of the antibacterial pro-inflammatory cytokines TNF, IL-12, and IL-6. It also inhibits the expression of co-stimulatory molecules CD80 and CD86 along with antigen presenting molecule MHC-I on macrophage and suppresses reactive nitrogen species formation, thereby promoting M2 macrophage polarization. Recombinant M. smegmatis expressing Rv2231c inhibited apoptosis in macrophages, promoting efficient bacterial survival and proliferation, thereby increasing virulence. Our results indicate that Rv2231c is a moonlighting protein that regulates multiple functions of M. tuberculosis pathophysiology to increase its virulence. These mechanistic insights can be used to better understand the pathogenesis of M. tuberculosis and to design strategies for tuberculosis mitigation.
Collapse
Affiliation(s)
- Sheeba Zarin
- Institute of Molecular Medicine, Jamia Hamdard, Hamdard Nagar, New Delhi, India
- Department of Life Science, School of Basic Sciences and Research, Sharda University, Greater Noida, Uttar Pradesh, 201310, India
| | - Mohd Shariq
- Cell Signaling and Inflammation Biology Lab, ICMR-National Institute of Pathology, New Delhi, 110029, India
| | - Nilisha Rastogi
- Cell Signaling and Inflammation Biology Lab, ICMR-National Institute of Pathology, New Delhi, 110029, India
| | - Yashika Ahuja
- Department of Life Science, School of Basic Sciences and Research, Sharda University, Greater Noida, Uttar Pradesh, 201310, India
| | - P Manjunath
- Cell Signaling and Inflammation Biology Lab, ICMR-National Institute of Pathology, New Delhi, 110029, India
| | - Anwar Alam
- Department of Biotechnology, School of Engineering and Technology, Sharda University, Greater Noida, 201310, India
| | - Seyed Ehtesham Hasnain
- Department of Life Science, School of Basic Sciences and Research, Sharda University, Greater Noida, Uttar Pradesh, 201310, India.
- Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology, New Delhi, 110016, India.
| | - Nasreen Zafar Ehtesham
- Department of Life Science, School of Basic Sciences and Research, Sharda University, Greater Noida, Uttar Pradesh, 201310, India.
| |
Collapse
|
5
|
Richardson R, Tejedor Navarro H, Amaral LAN, Stoeger T. Meta-Research: Understudied genes are lost in a leaky pipeline between genome-wide assays and reporting of results. eLife 2024; 12:RP93429. [PMID: 38546716 PMCID: PMC10977968 DOI: 10.7554/elife.93429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/01/2024] Open
Abstract
Present-day publications on human genes primarily feature genes that already appeared in many publications prior to completion of the Human Genome Project in 2003. These patterns persist despite the subsequent adoption of high-throughput technologies, which routinely identify novel genes associated with biological processes and disease. Although several hypotheses for bias in the selection of genes as research targets have been proposed, their explanatory powers have not yet been compared. Our analysis suggests that understudied genes are systematically abandoned in favor of better-studied genes between the completion of -omics experiments and the reporting of results. Understudied genes remain abandoned by studies that cite these -omics experiments. Conversely, we find that publications on understudied genes may even accrue a greater number of citations. Among 45 biological and experimental factors previously proposed to affect which genes are being studied, we find that 33 are significantly associated with the choice of hit genes presented in titles and abstracts of -omics studies. To promote the investigation of understudied genes, we condense our insights into a tool, find my understudied genes (FMUG), that allows scientists to engage with potential bias during the selection of hits. We demonstrate the utility of FMUG through the identification of genes that remain understudied in vertebrate aging. FMUG is developed in Flutter and is available for download at fmug.amaral.northwestern.edu as a MacOS/Windows app.
Collapse
Affiliation(s)
- Reese Richardson
- Interdisciplinary Biological Sciences, Northwestern UniversityEvanstonUnited States
- Department of Chemical and Biological Engineering, Northwestern UniversityEvanstonUnited States
| | - Heliodoro Tejedor Navarro
- Department of Chemical and Biological Engineering, Northwestern UniversityEvanstonUnited States
- Northwestern Institute on Complex Systems, Northwestern UniversityEvanstonUnited States
| | - Luis A Nunes Amaral
- Department of Chemical and Biological Engineering, Northwestern UniversityEvanstonUnited States
- Northwestern Institute on Complex Systems, Northwestern UniversityEvanstonUnited States
- Department of Molecular Biosciences, Northwestern UniversityEvanstonUnited States
- Department of Physics and Astronomy, Northwestern UniversityEvanstonUnited States
| | - Thomas Stoeger
- Department of Chemical and Biological Engineering, Northwestern UniversityEvanstonUnited States
- The Potocsnak Longevity Institute, Northwestern UniversityChicagoUnited States
- Simpson Querrey Lung Institute for Translational Science, Northwestern UniversityChicagoUnited States
| |
Collapse
|
6
|
Richardson RAK, Tejedor Navarro H, Amaral LAN, Stoeger T. Meta-Research: understudied genes are lost in a leaky pipeline between genome-wide assays and reporting of results. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.02.28.530483. [PMID: 36909550 PMCID: PMC10002660 DOI: 10.1101/2023.02.28.530483] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2023]
Abstract
Present-day publications on human genes primarily feature genes that already appeared in many publications prior to completion of the Human Genome Project in 2003. These patterns persist despite the subsequent adoption of high-throughput technologies, which routinely identify novel genes associated with biological processes and disease. Although several hypotheses for bias in the selection of genes as research targets have been proposed, their explanatory powers have not yet been compared. Our analysis suggests that understudied genes are systematically abandoned in favor of better-studied genes between the completion of -omics experiments and the reporting of results. Understudied genes remain abandoned by studies that cite these -omics experiments. Conversely, we find that publications on understudied genes may even accrue a greater number of citations. Among 45 biological and experimental factors previously proposed to affect which genes are being studied, we find that 33 are significantly associated with the choice of hit genes presented in titles and abstracts of - omics studies. To promote the investigation of understudied genes we condense our insights into a tool, find my understudied genes (FMUG), that allows scientists to engage with potential bias during the selection of hits. We demonstrate the utility of FMUG through the identification of genes that remain understudied in vertebrate aging. FMUG is developed in Flutter and is available for download at fmug.amaral.northwestern.edu as a MacOS/Windows app.
Collapse
Affiliation(s)
- Reese AK Richardson
- Interdisciplinary Biological Sciences, Northwestern University
- Department of Chemical and Biological Engineering, Northwestern University
| | - Heliodoro Tejedor Navarro
- Department of Chemical and Biological Engineering, Northwestern University
- Northwestern Institute on Complex Systems, Northwestern University
| | - Luis A Nunes Amaral
- Department of Chemical and Biological Engineering, Northwestern University
- Northwestern Institute on Complex Systems, Northwestern University
- Department of Physics and Astronomy, Northwestern University
- Department of Molecular Biosciences, Northwestern University
| | - Thomas Stoeger
- Department of Chemical and Biological Engineering, Northwestern University
- The Potocsnak Longevity Institute, Northwestern University
- Simpson Querrey Lung Institute for Translational Science, Northwestern University
| |
Collapse
|
7
|
Scanlan JL, Robin C. Phylogenomics of the Ecdysteroid Kinase-like (EcKL) Gene Family in Insects Highlights Roles in Both Steroid Hormone Metabolism and Detoxification. Genome Biol Evol 2024; 16:evae019. [PMID: 38291829 PMCID: PMC10859841 DOI: 10.1093/gbe/evae019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 11/21/2023] [Accepted: 01/23/2024] [Indexed: 02/01/2024] Open
Abstract
The evolutionary dynamics of large gene families can offer important insights into the functions of their individual members. While the ecdysteroid kinase-like (EcKL) gene family has previously been linked to the metabolism of both steroid molting hormones and xenobiotic toxins, the functions of nearly all EcKL genes are unknown, and there is little information on their evolution across all insects. Here, we perform comprehensive phylogenetic analyses on a manually annotated set of EcKL genes from 140 insect genomes, revealing the gene family is comprised of at least 13 subfamilies that differ in retention and stability. Our results show the only two genes known to encode ecdysteroid kinases belong to different subfamilies and therefore ecdysteroid metabolism functions must be spread throughout the EcKL family. We provide comparative phylogenomic evidence that EcKLs are involved in detoxification across insects, with positive associations between family size and dietary chemical complexity, and we also find similar evidence for the cytochrome P450 and glutathione S-transferase gene families. Unexpectedly, we find that the size of the clade containing a known ecdysteroid kinase is positively associated with host plant taxonomic diversity in Lepidoptera, possibly suggesting multiple functional shifts between hormone and xenobiotic metabolism. Our evolutionary analyses provide hypotheses of function and a robust framework for future experimental studies of the EcKL gene family. They also open promising new avenues for exploring the genomic basis of dietary adaptation in insects, including the classically studied coevolution of butterflies with their host plants.
Collapse
Affiliation(s)
- Jack L Scanlan
- School of BioSciences, The University of Melbourne, Melbourne, VIC 3010, Australia
| | - Charles Robin
- School of BioSciences, The University of Melbourne, Melbourne, VIC 3010, Australia
| |
Collapse
|
8
|
Novikova PV, Bhanu Busi S, Probst AJ, May P, Wilmes P. Functional prediction of proteins from the human gut archaeome. ISME COMMUNICATIONS 2024; 4:ycad014. [PMID: 38486809 PMCID: PMC10939349 DOI: 10.1093/ismeco/ycad014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Revised: 12/16/2023] [Accepted: 12/19/2023] [Indexed: 03/17/2024]
Abstract
The human gastrointestinal tract contains diverse microbial communities, including archaea. Among them, Methanobrevibacter smithii represents a highly active and clinically relevant methanogenic archaeon, being involved in gastrointestinal disorders, such as inflammatory bowel disease and obesity. Herein, we present an integrated approach using sequence and structure information to improve the annotation of M. smithii proteins using advanced protein structure prediction and annotation tools, such as AlphaFold2, trRosetta, ProFunc, and DeepFri. Of an initial set of 873 481 archaeal proteins, we found 707 754 proteins exclusively present in the human gut. Having analysed archaeal proteins together with 87 282 994 bacterial proteins, we identified unique archaeal proteins and archaeal-bacterial homologs. We then predicted and characterized functional domains and structures of 73 unique and homologous archaeal protein clusters linked the human gut and M. smithii. We refined annotations based on the predicted structures, extending existing sequence similarity-based annotations. We identified gut-specific archaeal proteins that may be involved in defense mechanisms, virulence, adhesion, and the degradation of toxic substances. Interestingly, we identified potential glycosyltransferases that could be associated with N-linked and O-glycosylation. Additionally, we found preliminary evidence for interdomain horizontal gene transfer between Clostridia species and M. smithii, which includes sporulation Stage V proteins AE and AD. Our study broadens the understanding of archaeal biology, particularly M. smithii, and highlights the importance of considering both sequence and structure for the prediction of protein function.
Collapse
Affiliation(s)
- Polina V Novikova
- Systems Ecology, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette L-4362, Luxembourg
| | - Susheel Bhanu Busi
- Systems Ecology, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette L-4362, Luxembourg
- UK Centre for Ecology and Hydrology, Wallingford, OX10 8 BB, United Kingdom
| | - Alexander J Probst
- Environmental Metagenomics, Department of Chemistry, Research Center One Health Ruhr of the University Alliance Ruhr, for Environmental Microbiology and Biotechnology, University Duisburg-Essen, Duisburg 47057, Germany
| | - Patrick May
- Bioinformatics Core, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette L-4362, Luxembourg
| | - Paul Wilmes
- Systems Ecology, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette L-4362, Luxembourg
- Department of Life Sciences and Medicine, Faculty of Science, Technology and Medicine, University of Luxembourg, Esch-sur-Alzette L-4362, Luxembourg
| |
Collapse
|
9
|
Knoshaug EP, Sun P, Nag A, Nguyen H, Mattoon EM, Zhang N, Liu J, Chen C, Cheng J, Zhang R, St. John P, Umen J. Identification and preliminary characterization of conserved uncharacterized proteins from Chlamydomonas reinhardtii, Arabidopsis thaliana, and Setaria viridis. PLANT DIRECT 2023; 7:e527. [PMID: 38044962 PMCID: PMC10690477 DOI: 10.1002/pld3.527] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 08/03/2023] [Accepted: 08/11/2023] [Indexed: 12/05/2023]
Abstract
The rapid accumulation of sequenced plant genomes in the past decade has outpaced the still difficult problem of genome-wide protein-coding gene annotation. A substantial fraction of protein-coding genes in all plant genomes are poorly annotated or unannotated and remain functionally uncharacterized. We identified unannotated proteins in three model organisms representing distinct branches of the green lineage (Viridiplantae): Arabidopsis thaliana (eudicot), Setaria viridis (monocot), and Chlamydomonas reinhardtii (Chlorophyte alga). Using similarity searching, we identified a subset of unannotated proteins that were conserved between these species and defined them as Deep Green proteins. Bioinformatic, genomic, and structural predictions were performed to begin classifying Deep Green genes and proteins. Compared to whole proteomes for each species, the Deep Green set was enriched for proteins with predicted chloroplast targeting signals predictive of photosynthetic or plastid functions, a result that was consistent with enrichment for daylight phase diurnal expression patterning. Structural predictions using AlphaFold and comparisons to known structures showed that a significant proportion of Deep Green proteins may possess novel folds. Though only available for three organisms, the Deep Green genes and proteins provide a starting resource of high-value targets for further investigation of potentially new protein structures and functions conserved across the green lineage.
Collapse
Affiliation(s)
- Eric P. Knoshaug
- Biosciences CenterNational Renewable Energy LaboratoryGoldenColoradoUSA
| | - Peipei Sun
- Donald Danforth Plant Science CenterSt. LouisMOUSA
| | - Ambarish Nag
- Computational Sciences CenterNational Renewable Energy LaboratoryGoldenColoradoUSA
| | - Huong Nguyen
- Donald Danforth Plant Science CenterSt. LouisMOUSA
- Institute of Genomics for Crop Abiotic Stress Tolerance, Department of Plant and Soil ScienceTexas Tech UniversityLubbockTexasUSA
| | - Erin M. Mattoon
- Donald Danforth Plant Science CenterSt. LouisMOUSA
- Plant and Microbial Biosciences Program, Division of Biology and Biomedical SciencesWashington University in Saint LouisSt. LouisMissouriUSA
| | | | - Jian Liu
- Department of Electrical Engineering and Computer ScienceUniversity of MissouriColumbiaMissouriUSA
| | - Chen Chen
- Department of Electrical Engineering and Computer ScienceUniversity of MissouriColumbiaMissouriUSA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer ScienceUniversity of MissouriColumbiaMissouriUSA
| | - Ru Zhang
- Donald Danforth Plant Science CenterSt. LouisMOUSA
| | - Peter St. John
- Biosciences CenterNational Renewable Energy LaboratoryGoldenColoradoUSA
| | - James Umen
- Donald Danforth Plant Science CenterSt. LouisMOUSA
| |
Collapse
|
10
|
Byrne KL, Szeligowski RV, Shen H. Phylogenetic Analysis Guides Transporter Protein Deorphanization: A Case Study of the SLC25 Family of Mitochondrial Metabolite Transporters. Biomolecules 2023; 13:1314. [PMID: 37759714 PMCID: PMC10526428 DOI: 10.3390/biom13091314] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 08/13/2023] [Accepted: 08/14/2023] [Indexed: 09/29/2023] Open
Abstract
Homology search and phylogenetic analysis have commonly been used to annotate gene function, although they are prone to error. We hypothesize that the power of homology search in functional annotation depends on the coupling of sequence variation to functional diversification, and we herein focus on the SoLute Carrier (SLC25) family of mitochondrial metabolite transporters to survey this coupling in a family-wide manner. The SLC25 family is the largest family of mitochondrial metabolite transporters in eukaryotes that translocate ligands of different chemical properties, ranging from nucleotides, amino acids, carboxylic acids and cofactors, presenting adequate experimentally validated functional diversification in ligand transport. Here, we combine phylogenetic analysis to profile SLC25 transporters across common eukaryotic model organisms, from Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, Danio rerio, to Homo sapiens, and assess their sequence adaptations to the transported ligands within individual subfamilies. Using several recently studied and poorly characterized SLC25 transporters, we discuss the potentials and limitations of phylogenetic analysis in guiding functional characterization.
Collapse
Affiliation(s)
- Katie L. Byrne
- Cellular and Molecular Physiology Department, Yale School of Medicine, New Haven, CT 06510, USA
- Systems Biology Institute, Yale West Campus, West Haven, CT 06516, USA
- Yale College, New Haven, CT 06511, USA
| | - Richard V. Szeligowski
- Cellular and Molecular Physiology Department, Yale School of Medicine, New Haven, CT 06510, USA
- Systems Biology Institute, Yale West Campus, West Haven, CT 06516, USA
| | - Hongying Shen
- Cellular and Molecular Physiology Department, Yale School of Medicine, New Haven, CT 06510, USA
- Systems Biology Institute, Yale West Campus, West Haven, CT 06516, USA
| |
Collapse
|
11
|
Rocha JJ, Jayaram SA, Stevens TJ, Muschalik N, Shah RD, Emran S, Robles C, Freeman M, Munro S. Functional unknomics: Systematic screening of conserved genes of unknown function. PLoS Biol 2023; 21:e3002222. [PMID: 37552676 PMCID: PMC10409296 DOI: 10.1371/journal.pbio.3002222] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Accepted: 06/27/2023] [Indexed: 08/10/2023] Open
Abstract
The human genome encodes approximately 20,000 proteins, many still uncharacterised. It has become clear that scientific research tends to focus on well-studied proteins, leading to a concern that poorly understood genes are unjustifiably neglected. To address this, we have developed a publicly available and customisable "Unknome database" that ranks proteins based on how little is known about them. We applied RNA interference (RNAi) in Drosophila to 260 unknown genes that are conserved between flies and humans. Knockdown of some genes resulted in loss of viability, and functional screening of the rest revealed hits for fertility, development, locomotion, protein quality control, and resilience to stress. CRISPR/Cas9 gene disruption validated a component of Notch signalling and 2 genes contributing to male fertility. Our work illustrates the importance of poorly understood genes, provides a resource to accelerate future research, and highlights a need to support database curation to ensure that misannotation does not erode our awareness of our own ignorance.
Collapse
Affiliation(s)
- João J. Rocha
- MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
| | | | - Tim J. Stevens
- MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
| | | | - Rajen D. Shah
- Centre for Mathematical Sciences, University of Cambridge, Cambridge, United Kingdom
| | - Sahar Emran
- MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
| | - Cristina Robles
- MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
| | - Matthew Freeman
- MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
- Sir William Dunn School of Pathology, University of Oxford, Oxford, United Kingdom
| | - Sean Munro
- MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
| |
Collapse
|
12
|
Sanchez-Briñas A, Duran-Ruiz C, Astola A, Arroyo MM, Raposo FG, Valle A, Bolivar J. ZNF330/NOA36 interacts with HSPA1 and HSPA8 and modulates cell cycle and proliferation in response to heat shock in HEK293 cells. Biol Direct 2023; 18:26. [PMID: 37254218 DOI: 10.1186/s13062-023-00384-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Accepted: 05/20/2023] [Indexed: 06/01/2023] Open
Abstract
BACKGROUND The human genome contains nearly 20.000 protein-coding genes, but there are still more than 6,000 proteins poorly characterized. Among them, ZNF330/NOA36 stand out because it is a highly evolutionarily conserved nucleolar zinc-finger protein found in the genome of ancient animal phyla like sponges or cnidarians, up to humans. Firstly described as a human autoantigen, NOA36 is expressed in all tissues and human cell lines, and it has been related to apoptosis in human cells as well as in muscle morphogenesis and hematopoiesis in Drosophila. Nevertheless, further research is required to better understand the roles of this highly conserved protein. RESULTS Here, we have investigated possible interactors of human ZNF330/NOA36 through affinity-purification mass spectrometry (AP-MS). Among them, NOA36 interaction with HSPA1 and HSPA8 heat shock proteins was disclosed and further validated by co-immunoprecipitation. Also, "Enhancer of Rudimentary Homolog" (ERH), a protein involved in cell cycle regulation, was detected in the AP-MS approach. Furthermore, we developed a NOA36 knockout cell line using CRISPR/Cas9n in HEK293, and we found that the cell cycle profile was modified, and proliferation decreased after heat shock in the knocked-out cells. These differences were not due to a different expression of the HSPs genes detected in the AP-MS after inducing stress. CONCLUSIONS Our results indicate that NOA36 is necessary for proliferation recovery in response to thermal stress to achieve a regular cell cycle profile, likely by interaction with HSPA1 and HSPA8. Further studies would be required to disclose the relevance of NOA36-EHR interaction in this context.
Collapse
Affiliation(s)
- Alejandra Sanchez-Briñas
- Department of Biomedicine, Biotechnology and Public Health-Biochemistry and Molecular Biology, Campus Universitario de Puerto Real, University of Cadiz, Puerto Real, Cadiz, 11510, Spain
| | - Carmen Duran-Ruiz
- Department of Biomedicine, Biotechnology and Public Health-Biochemistry and Molecular Biology, Campus Universitario de Puerto Real, University of Cadiz, Puerto Real, Cadiz, 11510, Spain
- Biomedical Research and Innovation Institute of Cadiz (INiBICA), Cadiz, Spain
| | - Antonio Astola
- Department of Biomedicine, Biotechnology and Public Health-Biochemistry and Molecular Biology, Campus Universitario de Puerto Real, University of Cadiz, Puerto Real, Cadiz, 11510, Spain
- Institute of Biomolecules (INBIO), University of Cadiz, Cadiz, Spain
| | - Marta Marina Arroyo
- Department of Biomedicine, Biotechnology and Public Health-Biochemistry and Molecular Biology, Campus Universitario de Puerto Real, University of Cadiz, Puerto Real, Cadiz, 11510, Spain
| | - Fátima G Raposo
- Department of Biomedicine, Biotechnology and Public Health-Biochemistry and Molecular Biology, Campus Universitario de Puerto Real, University of Cadiz, Puerto Real, Cadiz, 11510, Spain
| | - Antonio Valle
- Department of Biomedicine, Biotechnology and Public Health-Biochemistry and Molecular Biology, Campus Universitario de Puerto Real, University of Cadiz, Puerto Real, Cadiz, 11510, Spain
- Institute of Viticulture and Agri-Food Research (IVAGRO) - International Campus of Excellence (ceiA3), University of Cadiz, Cadiz, Spain
| | - Jorge Bolivar
- Department of Biomedicine, Biotechnology and Public Health-Biochemistry and Molecular Biology, Campus Universitario de Puerto Real, University of Cadiz, Puerto Real, Cadiz, 11510, Spain.
- Institute of Biomolecules (INBIO), University of Cadiz, Cadiz, Spain.
| |
Collapse
|
13
|
Favilli L, Griffith CM, Schymanski EL, Linster CL. High-throughput Saccharomyces cerevisiae cultivation method for credentialing-based untargeted metabolomics. Anal Bioanal Chem 2023:10.1007/s00216-023-04724-5. [PMID: 37212869 DOI: 10.1007/s00216-023-04724-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Revised: 04/24/2023] [Accepted: 04/28/2023] [Indexed: 05/23/2023]
Abstract
Identifying metabolites in model organisms is critical for many areas of biology, including unravelling disease aetiology or elucidating functions of putative enzymes. Even now, hundreds of predicted metabolic genes in Saccharomyces cerevisiae remain uncharacterized, indicating that our understanding of metabolism is far from complete even in well-characterized organisms. While untargeted high-resolution mass spectrometry (HRMS) enables the detection of thousands of features per analysis, many of these have a non-biological origin. Stable isotope labelling (SIL) approaches can serve as credentialing strategies to distinguish biologically relevant features from background signals, but implementing these experiments at large scale remains challenging. Here, we developed a SIL-based approach for high-throughput untargeted metabolomics in S. cerevisiae, including deep-48 well format-based cultivation and metabolite extraction, building on the peak annotation and verification engine (PAVE) tool. Aqueous and nonpolar extracts were analysed using HILIC and RP liquid chromatography, respectively, coupled to Orbitrap Q Exactive HF mass spectrometry. Of the approximately 37,000 total detected features, only 3-7% of the features were credentialed and used for data analysis with open-source software such as MS-DIAL, MetFrag, Shinyscreen, SIRIUS CSI:FingerID, and MetaboAnalyst, leading to the successful annotation of 198 metabolites using MS2 database matching. Comparable metabolic profiles were observed for wild-type and sdh1Δ yeast strains grown in deep-48 well plates versus the classical shake flask format, including the expected increase in intracellular succinate concentration in the sdh1Δ strain. The described approach enables high-throughput yeast cultivation and credentialing-based untargeted metabolomics, providing a means to efficiently perform molecular phenotypic screens and help complete metabolic networks.
Collapse
Affiliation(s)
- Lorenzo Favilli
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Avenue du Swing 6, Belvaux, L-4367, Luxembourg.
| | - Corey M Griffith
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Avenue du Swing 6, Belvaux, L-4367, Luxembourg
| | - Emma L Schymanski
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Avenue du Swing 6, Belvaux, L-4367, Luxembourg
| | - Carole L Linster
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Avenue du Swing 6, Belvaux, L-4367, Luxembourg
| |
Collapse
|
14
|
Bailoni E, Partipilo M, Coenradij J, Grundel DAJ, Slotboom DJ, Poolman B. Minimal Out-of-Equilibrium Metabolism for Synthetic Cells: A Membrane Perspective. ACS Synth Biol 2023; 12:922-946. [PMID: 37027340 PMCID: PMC10127287 DOI: 10.1021/acssynbio.3c00062] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Indexed: 04/08/2023]
Abstract
Life-like systems need to maintain a basal metabolism, which includes importing a variety of building blocks required for macromolecule synthesis, exporting dead-end products, and recycling cofactors and metabolic intermediates, while maintaining steady internal physical and chemical conditions (physicochemical homeostasis). A compartment, such as a unilamellar vesicle, functionalized with membrane-embedded transport proteins and metabolic enzymes encapsulated in the lumen meets these requirements. Here, we identify four modules designed for a minimal metabolism in a synthetic cell with a lipid bilayer boundary: energy provision and conversion, physicochemical homeostasis, metabolite transport, and membrane expansion. We review design strategies that can be used to fulfill these functions with a focus on the lipid and membrane protein composition of a cell. We compare our bottom-up design with the equivalent essential modules of JCVI-syn3a, a top-down genome-minimized living cell with a size comparable to that of large unilamellar vesicles. Finally, we discuss the bottlenecks related to the insertion of a complex mixture of membrane proteins into lipid bilayers and provide a semiquantitative estimate of the relative surface area and lipid-to-protein mass ratios (i.e., the minimal number of membrane proteins) that are required for the construction of a synthetic cell.
Collapse
Affiliation(s)
- Eleonora Bailoni
- Department
of Biochemistry and Molecular Systems Biology, Groningen Biomolecular
Sciences and Biotechnology Institute, University
of Groningen, Nijenborgh
4, 9747 AG Groningen, The Netherlands
| | - Michele Partipilo
- Department
of Biochemistry and Molecular Systems Biology, Groningen Biomolecular
Sciences and Biotechnology Institute, University
of Groningen, Nijenborgh
4, 9747 AG Groningen, The Netherlands
| | - Jelmer Coenradij
- Department
of Biochemistry and Molecular Systems Biology, Groningen Biomolecular
Sciences and Biotechnology Institute, University
of Groningen, Nijenborgh
4, 9747 AG Groningen, The Netherlands
| | - Douwe A. J. Grundel
- Department
of Biochemistry and Molecular Systems Biology, Groningen Biomolecular
Sciences and Biotechnology Institute, University
of Groningen, Nijenborgh
4, 9747 AG Groningen, The Netherlands
| | - Dirk J. Slotboom
- Department
of Biochemistry and Molecular Systems Biology, Groningen Biomolecular
Sciences and Biotechnology Institute, University
of Groningen, Nijenborgh
4, 9747 AG Groningen, The Netherlands
| | - Bert Poolman
- Department
of Biochemistry and Molecular Systems Biology, Groningen Biomolecular
Sciences and Biotechnology Institute, University
of Groningen, Nijenborgh
4, 9747 AG Groningen, The Netherlands
| |
Collapse
|
15
|
Rhee KY, Jansen RS, Grundner C. Activity-based annotation: the emergence of systems biochemistry. Trends Biochem Sci 2022; 47:785-794. [PMID: 35430135 PMCID: PMC9378515 DOI: 10.1016/j.tibs.2022.03.017] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Revised: 03/10/2022] [Accepted: 03/22/2022] [Indexed: 01/21/2023]
Abstract
Current tools to annotate protein function have failed to keep pace with the speed of DNA sequencing and exponentially growing number of proteins of unknown function (PUFs). A major contributing factor to this mismatch is the historical lack of high-throughput methods to experimentally determine biochemical activity. Activity-based methods, such as activity-based metabolite and protein profiling, are emerging as new approaches for unbiased, global, biochemical annotation of protein function. In this review, we highlight recent experimental, activity-based approaches that offer new opportunities to determine protein function in a biologically agnostic and systems-level manner.
Collapse
Affiliation(s)
- Kyu Y Rhee
- Department of Medicine, Weill Cornell Medical College, New York, NY, USA.
| | - Robert S Jansen
- Department of Microbiology, Radboud University, Nijmegen, The Netherlands.
| | - Christoph Grundner
- Center for Global Infectious Disease Research, Seattle Children's Research Institute, Seattle, WA, USA; Department of Global Health, University of Washington, Seattle, WA, USA; Department of Pediatrics, University of Washington, Seattle, WA, USA.
| |
Collapse
|
16
|
Guo JT, Malik F. Single-Stranded DNA Binding Proteins and Their Identification Using Machine Learning-Based Approaches. Biomolecules 2022; 12:biom12091187. [PMID: 36139026 PMCID: PMC9496475 DOI: 10.3390/biom12091187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Revised: 08/11/2022] [Accepted: 08/24/2022] [Indexed: 11/25/2022] Open
Abstract
Single-stranded DNA (ssDNA) binding proteins (SSBs) are critical in maintaining genome stability by protecting the transient existence of ssDNA from damage during essential biological processes, such as DNA replication and gene transcription. The single-stranded region of telomeres also requires protection by ssDNA binding proteins from being attacked in case it is wrongly recognized as an anomaly. In addition to their critical roles in genome stability and integrity, it has been demonstrated that ssDNA and SSB-ssDNA interactions play critical roles in transcriptional regulation in all three domains of life and viruses. In this review, we present our current knowledge of the structure and function of SSBs and the structural features for SSB binding specificity. We then discuss the machine learning-based approaches that have been developed for the prediction of SSBs from double-stranded DNA (dsDNA) binding proteins (DSBs).
Collapse
|
17
|
Escudeiro P, Henry CS, Dias RP. Functional characterization of prokaryotic dark matter: the road so far and what lies ahead. CURRENT RESEARCH IN MICROBIAL SCIENCES 2022; 3:100159. [PMID: 36561390 PMCID: PMC9764257 DOI: 10.1016/j.crmicr.2022.100159] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Revised: 07/18/2022] [Accepted: 08/05/2022] [Indexed: 12/25/2022] Open
Abstract
Eight-hundred thousand to one trillion prokaryotic species may inhabit our planet. Yet, fewer than two-hundred thousand prokaryotic species have been described. This uncharted fraction of microbial diversity, and its undisclosed coding potential, is known as the "microbial dark matter" (MDM). Next-generation sequencing has allowed to collect a massive amount of genome sequence data, leading to unprecedented advances in the field of genomics. Still, harnessing new functional information from the genomes of uncultured prokaryotes is often limited by standard classification methods. These methods often rely on sequence similarity searches against reference genomes from cultured species. This hinders the discovery of unique genetic elements that are missing from the cultivated realm. It also contributes to the accumulation of prokaryotic gene products of unknown function among public sequence data repositories, highlighting the need for new approaches for sequencing data analysis and classification. Increasing evidence indicates that these proteins of unknown function might be a treasure trove of biotechnological potential. Here, we outline the challenges, opportunities, and the potential hidden within the functional dark matter (FDM) of prokaryotes. We also discuss the pitfalls surrounding molecular and computational approaches currently used to probe these uncharted waters, and discuss future opportunities for research and applications.
Collapse
Affiliation(s)
- Pedro Escudeiro
- BioISI - Instituto de Biosistemas e Ciências Integrativas, Faculdade de Ciências, Universidade de Lisboa, Lisboa 1749-016, Portugal
| | - Christopher S. Henry
- Argonne National Laboratory, Lemont, Illinois, USA,University of Chicago, Chicago, Illinois, USA
| | - Ricardo P.M. Dias
- BioISI - Instituto de Biosistemas e Ciências Integrativas, Faculdade de Ciências, Universidade de Lisboa, Lisboa 1749-016, Portugal,iXLab - Innovation for National Biological Resilience, Faculdade de Ciências, Universidade de Lisboa, Lisboa 1749-016, Portugal,Corresponding author.
| |
Collapse
|
18
|
Solís-Fernández G, Montero-Calle A, Sánchez-Martínez M, Peláez-García A, Fernández-Aceñero MJ, Pallarés P, Alonso-Navarro M, Mendiola M, Hendrix J, Hardisson D, Bartolomé RA, Hofkens J, Rocha S, Barderas R. Aryl-hydrocarbon receptor-interacting protein regulates tumorigenic and metastatic properties of colorectal cancer cells driving liver metastasis. Br J Cancer 2022. [DOI: 10.1038/s41416-022-01762-1
expr 880987936 + 827650491] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/16/2023] Open
|
19
|
Aryl-hydrocarbon receptor-interacting protein regulates tumorigenic and metastatic properties of colorectal cancer cells driving liver metastasis. Br J Cancer 2022; 126:1604-1615. [PMID: 35347323 PMCID: PMC9130499 DOI: 10.1038/s41416-022-01762-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2021] [Revised: 02/07/2022] [Accepted: 02/15/2022] [Indexed: 01/05/2023] Open
Abstract
BACKGROUND Liver metastasis is the primary cause of colorectal cancer (CRC)-associated death. Aryl-hydrocarbon receptor-interacting protein (AIP), a putative positive intermediary in aryl-hydrocarbon receptor-mediated signalling, is overexpressed in highly metastatic human KM12SM CRC cells and other highly metastatic CRC cells. METHODS Meta-analysis and immunohistochemistry were used to assess the relevance of AIP. Cellular functions and signalling mechanisms mediated by AIP were assessed by gain-of-function experiments and in vitro and in vivo experiments. RESULTS A significant association of high AIP expression with poor CRC patients' survival was observed. Gain-of-function and quantitative proteomics experiments demonstrated that AIP increased tumorigenic and metastatic properties of isogenic KM12C (poorly metastatic) and KM12SM (highly metastatic to the liver) CRC cells. AIP overexpression dysregulated epithelial-to-mesenchymal (EMT) markers and induced several transcription factors and Cadherin-17 activation. The former induced the signalling activation of AKT, SRC and JNK kinases to increase adhesion, migration and invasion of CRC cells. In vivo, AIP expressing KM12 cells induced tumour growth and liver metastasis. Furthermore, KM12C (poorly metastatic) cells ectopically expressing AIP became metastatic to the liver. CONCLUSIONS Our data reveal new roles for AIP in regulating proteins associated with cancer and metastasis to induce tumorigenic and metastatic properties in colon cancer cells driving liver metastasis.
Collapse
|
20
|
Griffith CM, Walvekar AS, Linster CL. Approaches for completing metabolic networks through metabolite damage and repair discovery. CURRENT OPINION IN SYSTEMS BIOLOGY 2021; 28:None. [PMID: 34957344 PMCID: PMC8669784 DOI: 10.1016/j.coisb.2021.100379] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Metabolites are prone to damage, either via enzymatic side reactions, which collectively form the underground metabolism, or via spontaneous chemical reactions. The resulting non-canonical metabolites that can be toxic, are mended by dedicated "metabolite repair enzymes." Deficiencies in the latter can cause severe disease in humans, whereas inclusion of repair enzymes in metabolically engineered systems can improve the production yield of value-added chemicals. The metabolite damage and repair loops are typically not yet included in metabolic reconstructions and it is likely that many remain to be discovered. Here, we review strategies and associated challenges for unveiling non-canonical metabolites and metabolite repair enzymes, including systematic approaches based on high-resolution mass spectrometry, metabolome-wide side-activity prediction, as well as high-throughput substrate and phenotypic screens.
Collapse
Affiliation(s)
| | | | - Carole L. Linster
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| |
Collapse
|
21
|
Integrated mass spectrometry-based multi-omics for elucidating mechanisms of bacterial virulence. Biochem Soc Trans 2021; 49:1905-1926. [PMID: 34374408 DOI: 10.1042/bst20191088] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Revised: 07/19/2021] [Accepted: 07/21/2021] [Indexed: 11/17/2022]
Abstract
Despite being considered the simplest form of life, bacteria remain enigmatic, particularly in light of pathogenesis and evolving antimicrobial resistance. After three decades of genomics, we remain some way from understanding these organisms, and a substantial proportion of genes remain functionally unknown. Methodological advances, principally mass spectrometry (MS), are paving the way for parallel analysis of the proteome, metabolome and lipidome. Each provides a global, complementary assay, in addition to genomics, and the ability to better comprehend how pathogens respond to changes in their internal (e.g. mutation) and external environments consistent with infection-like conditions. Such responses include accessing necessary nutrients for survival in a hostile environment where co-colonizing bacteria and normal flora are acclimated to the prevailing conditions. Multi-omics can be harnessed across temporal and spatial (sub-cellular) dimensions to understand adaptation at the molecular level. Gene deletion libraries, in conjunction with large-scale approaches and evolving bioinformatics integration, will greatly facilitate next-generation vaccines and antimicrobial interventions by highlighting novel targets and pathogen-specific pathways. MS is also central in phenotypic characterization of surface biomolecules such as lipid A, as well as aiding in the determination of protein interactions and complexes. There is increasing evidence that bacteria are capable of widespread post-translational modification, including phosphorylation, glycosylation and acetylation; with each contributing to virulence. This review focuses on the bacterial genotype to phenotype transition and surveys the recent literature showing how the genome can be validated at the proteome, metabolome and lipidome levels to provide an integrated view of organism response to host conditions.
Collapse
|
22
|
de Rond T, Asay JE, Moore BS. Co-occurrence of enzyme domains guides the discovery of an oxazolone synthetase. Nat Chem Biol 2021; 17:794-799. [PMID: 34099916 PMCID: PMC8238888 DOI: 10.1038/s41589-021-00808-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2020] [Accepted: 04/29/2021] [Indexed: 02/04/2023]
Abstract
Multidomain enzymes orchestrate two or more catalytic activities to carry out metabolic transformations with increased control and speed. Here, we report the design and development of a genome-mining approach for targeted discovery of biochemical transformations through the analysis of co-occurring enzyme domains (CO-ED) in a single protein. CO-ED was designed to identify unannotated multifunctional enzymes for functional characterization and discovery based on the premise that linked enzyme domains have evolved to function collaboratively. Guided by CO-ED, we targeted an unannotated predicted ThiF-nitroreductase di-domain enzyme found in more than 50 proteobacteria. Through heterologous expression and biochemical reconstitution, we discovered a series of natural products containing the rare oxazolone heterocycle and characterized their biosynthesis. Notably, we identified the di-domain enzyme as an oxazolone synthetase, validating CO-ED-guided genome mining as a methodology with potential broad utility for both the discovery of unusual enzymatic transformations and the functional annotation of multidomain enzymes.
Collapse
Affiliation(s)
- Tristan de Rond
- Center for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA 92093
| | - Julia E. Asay
- Center for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA 92093
| | - Bradley S. Moore
- Center for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA 92093,Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA 92093
| |
Collapse
|
23
|
Queirós P, Delogu F, Hickl O, May P, Wilmes P. Mantis: flexible and consensus-driven genome annotation. Gigascience 2021; 10:6291114. [PMID: 34076241 PMCID: PMC8170692 DOI: 10.1093/gigascience/giab042] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Revised: 03/22/2021] [Accepted: 05/14/2021] [Indexed: 12/22/2022] Open
Abstract
Background The rapid development of the (meta-)omics fields has produced an unprecedented amount of high-resolution and high-fidelity data. Through the use of these datasets we can infer the role of previously functionally unannotated proteins from single organisms and consortia. In this context, protein function annotation can be described as the identification of regions of interest (i.e., domains) in protein sequences and the assignment of biological functions. Despite the existence of numerous tools, challenges remain in terms of speed, flexibility, and reproducibility. In the big data era, it is also increasingly important to cease limiting our findings to a single reference, coalescing knowledge from different data sources, and thus overcoming some limitations in overly relying on computationally generated data from single sources. Results We implemented a protein annotation tool, Mantis, which uses database identifiers intersection and text mining to integrate knowledge from multiple reference data sources into a single consensus-driven output. Mantis is flexible, allowing for the customization of reference data and execution parameters, and is reproducible across different research goals and user environments. We implemented a depth-first search algorithm for domain-specific annotation, which significantly improved annotation performance compared to sequence-wide annotation. The parallelized implementation of Mantis results in short runtimes while also outputting high coverage and high-quality protein function annotations. Conclusions Mantis is a protein function annotation tool that produces high-quality consensus-driven protein annotations. It is easy to set up, customize, and use, scaling from single genomes to large metagenomes. Mantis is available under the MIT license at https://github.com/PedroMTQ/mantis.
Collapse
Affiliation(s)
- Pedro Queirós
- Systems Ecology, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 6 Avenue du Swing, 4367 Esch-sur-Alzette, Luxembourg
| | - Francesco Delogu
- Systems Ecology, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 6 Avenue du Swing, 4367 Esch-sur-Alzette, Luxembourg
| | - Oskar Hickl
- Bioinformatics Core, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 6 Avenue du Swing, 4367 Esch-sur-Alzette, Luxembourg
| | - Patrick May
- Bioinformatics Core, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 6 Avenue du Swing, 4367 Esch-sur-Alzette, Luxembourg
| | - Paul Wilmes
- Systems Ecology, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 6 Avenue du Swing, 4367 Esch-sur-Alzette, Luxembourg
| |
Collapse
|
24
|
Reynolds KA, Rosa-Molinar E, Ward RE, Zhang H, Urbanowicz BR, Settles AM. Accelerating biological insight for understudied genes. Integr Comp Biol 2021; 61:2233-2243. [PMID: 33970251 DOI: 10.1093/icb/icab029] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The rapid expansion of genome sequence data is increasing the discovery of protein-coding genes across all domains of life. Annotating these genes with reliable functional information is necessary to understand evolution, to define the full biochemical space accessed by nature, and to identify target genes for biotechnology improvements. The vast majority of proteins are annotated based on sequence conservation with no specific biological, biochemical, genetic, or cellular function identified. Recent technical advances throughout the biological sciences enable experimental research on these understudied protein-coding genes in a broader collection of species. However, scientists have incentives and biases to continue focusing on well documented genes within their preferred model organism. This perspective suggests a research model that seeks to break historic silos of research bias by enabling interdisciplinary teams to accelerate biological functional annotation. We propose an initiative to develop coordinated projects of collaborating evolutionary biologists, cell biologists, geneticists, and biochemists that will focus on subsets of target genes in multiple model organisms. Concurrent analysis in multiple organisms takes advantage of evolutionary divergence and selection, which causes individual species to be better suited as experimental models for specific genes. Most importantly, multisystem approaches would encourage transdisciplinary critical thinking and hypothesis testing that is inherently slow in current biological research.
Collapse
Affiliation(s)
- Kimberly A Reynolds
- The Green Center for Systems Biology and the Department of Biophysics, The University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Eduardo Rosa-Molinar
- Department of Pharmacology & Toxicology, The University of Kansas, Lawrence, KS 66047, USA
| | - Robert E Ward
- Department of Biology, Case Western Reserve University, Cleveland, OH 44106, USA
| | - Hongbin Zhang
- Department of Soil and Crop Sciences, Texas A&M University, College Station, TX 77843, USA
| | - Breeanna R Urbanowicz
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, Georgia 30602, USA
| | - A Mark Settles
- Bioengineering Branch, NASA Ames Research Center, Moffett Field, CA USA
| |
Collapse
|
25
|
Poudel S, Cope AL, O'Dell KB, Guss AM, Seo H, Trinh CT, Hettich RL. Identification and characterization of proteins of unknown function (PUFs) in Clostridium thermocellum DSM 1313 strains as potential genetic engineering targets. BIOTECHNOLOGY FOR BIOFUELS 2021; 14:116. [PMID: 33971924 PMCID: PMC8112048 DOI: 10.1186/s13068-021-01964-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/29/2020] [Accepted: 04/26/2021] [Indexed: 05/13/2023]
Abstract
BACKGROUND Mass spectrometry-based proteomics can identify and quantify thousands of proteins from individual microbial species, but a significant percentage of these proteins are unannotated and hence classified as proteins of unknown function (PUFs). Due to the difficulty in extracting meaningful metabolic information, PUFs are often overlooked or discarded during data analysis, even though they might be critically important in functional activities, in particular for metabolic engineering research. RESULTS We optimized and employed a pipeline integrating various "guilt-by-association" (GBA) metrics, including differential expression and co-expression analyses of high-throughput mass spectrometry proteome data and phylogenetic coevolution analysis, and sequence homology-based approaches to determine putative functions for PUFs in Clostridium thermocellum. Our various analyses provided putative functional information for over 95% of the PUFs detected by mass spectrometry in a wild-type and/or an engineered strain of C. thermocellum. In particular, we validated a predicted acyltransferase PUF (WP_003519433.1) with functional activity towards 2-phenylethyl alcohol, consistent with our GBA and sequence homology-based predictions. CONCLUSIONS This work demonstrates the value of leveraging sequence homology-based annotations with empirical evidence based on the concept of GBA to broadly predict putative functions for PUFs, opening avenues to further interrogation via targeted experiments.
Collapse
Affiliation(s)
- Suresh Poudel
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
- The Center for Bioenergy Innovation at Oak Ridge National Laboratory, Oak Ridge, TN, USA
- The Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN, USA
| | - Alexander L Cope
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
- The Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN, USA
| | - Kaela B O'Dell
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
- The Center for Bioenergy Innovation at Oak Ridge National Laboratory, Oak Ridge, TN, USA
- The Bredesen Center, University of Tennessee, Knoxville, TN, USA
| | - Adam M Guss
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
- The Bredesen Center, University of Tennessee, Knoxville, TN, USA
| | - Hyeongmin Seo
- The Center for Bioenergy Innovation at Oak Ridge National Laboratory, Oak Ridge, TN, USA
- Department of Chemical and Biomolecular Engineering, University of Tennessee, Knoxville, TN, USA
| | - Cong T Trinh
- The Center for Bioenergy Innovation at Oak Ridge National Laboratory, Oak Ridge, TN, USA
- The Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN, USA
- The Bredesen Center, University of Tennessee, Knoxville, TN, USA
- Department of Chemical and Biomolecular Engineering, University of Tennessee, Knoxville, TN, USA
| | - Robert L Hettich
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA.
| |
Collapse
|
26
|
The metalloprotein YhcH is an anomerase providing N-acetylneuraminate aldolase with the open form of its substrate. J Biol Chem 2021; 296:100699. [PMID: 33895133 PMCID: PMC8141875 DOI: 10.1016/j.jbc.2021.100699] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Revised: 04/15/2021] [Accepted: 04/21/2021] [Indexed: 11/24/2022] Open
Abstract
N-acetylneuraminate (Neu5Ac), an abundant sugar present in glycans in vertebrates and some bacteria, can be used as an energy source by several prokaryotes, including Escherichia coli. In solution, more than 99% of Neu5Ac is in cyclic form (≈92% beta-anomer and ≈7% alpha-anomer), whereas <0.5% is in the open form. The aldolase that initiates Neu5Ac metabolism in E. coli, NanA, has been reported to act on the alpha-anomer. Surprisingly, when we performed this reaction at pH 6 to minimize spontaneous anomerization, we found NanA and its human homolog NPL preferentially metabolize the open form of this substrate. We tested whether the E. coli Neu5Ac anomerase NanM could promote turnover, finding it stimulated the utilization of both beta and alpha-anomers by NanA in vitro. However, NanM is localized in the periplasmic space and cannot facilitate Neu5Ac metabolism by NanA in the cytoplasm in vivo. We discovered that YhcH, a cytoplasmic protein encoded by many Neu5Ac catabolic operons and belonging to a protein family of unknown function (DUF386), also facilitated Neu5Ac utilization by NanA and NPL and displayed Neu5Ac anomerase activity in vitro. YhcH contains Zn, and its accelerating effect on the aldolase reaction was inhibited by metal chelators. Remarkably, several transition metals accelerated Neu5Ac anomerization in the absence of enzyme. Experiments with E. coli mutants indicated that YhcH expression provides a selective advantage for growth on Neu5Ac. In conclusion, YhcH plays the unprecedented role of providing an aldolase with the preferred unstable open form of its substrate.
Collapse
|
27
|
Zangelmi E, Stanković T, Malatesta M, Acquotti D, Pallitsch K, Peracchi A. Discovery of a New, Recurrent Enzyme in Bacterial Phosphonate Degradation: ( R)-1-Hydroxy-2-aminoethylphosphonate Ammonia-lyase. Biochemistry 2021; 60:1214-1225. [PMID: 33830741 PMCID: PMC8154272 DOI: 10.1021/acs.biochem.1c00092] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Revised: 03/26/2021] [Indexed: 01/09/2023]
Abstract
Phosphonates represent an important source of bioavailable phosphorus in certain environments. Accordingly, many microorganisms (particularly marine bacteria) possess catabolic pathways to degrade these molecules. One example is the widespread hydrolytic route for the breakdown of 2-aminoethylphosphonate (AEP, the most common biogenic phosphonate). In this pathway, the aminotransferase PhnW initially converts AEP into phosphonoacetaldehyde (PAA), which is then cleaved by the hydrolase PhnX to yield acetaldehyde and phosphate. This work focuses on a pyridoxal 5'-phosphate-dependent enzyme that is encoded in >13% of the bacterial gene clusters containing the phnW-phnX combination. This enzyme (which we termed PbfA) is annotated as a transaminase, but there is no obvious need for an additional transamination reaction in the established AEP degradation pathway. We report here that PbfA from the marine bacterium Vibrio splendidus catalyzes an elimination reaction on the naturally occurring compound (R)-1-hydroxy-2-aminoethylphosphonate (R-HAEP). The reaction releases ammonia and generates PAA, which can be then hydrolyzed by PhnX. In contrast, PbfA is not active toward the S enantiomer of HAEP or other HAEP-related compounds such as ethanolamine and d,l-isoserine, indicating a very high substrate specificity. We also show that R-HAEP (despite being structurally similar to AEP) is not processed efficiently by the PhnW-PhnX couple in the absence of PbfA. In summary, the reaction catalyzed by PbfA serves to funnel R-HAEP into the hydrolytic pathway for AEP degradation, expanding the scope and the usefulness of the pathway itself.
Collapse
Affiliation(s)
- Erika Zangelmi
- Department
of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, I-43124 Parma, Italy
| | - Toda Stanković
- Institute
of Organic Chemistry, University of Vienna, Währingerstrasse 38, A-1090 Vienna, Austria
| | - Marco Malatesta
- Department
of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, I-43124 Parma, Italy
| | - Domenico Acquotti
- Centro
di Servizi e Misure “Giuseppe Casnati”, University of Parma, I-43124 Parma, Italy
| | - Katharina Pallitsch
- Institute
of Organic Chemistry, University of Vienna, Währingerstrasse 38, A-1090 Vienna, Austria
| | - Alessio Peracchi
- Department
of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, I-43124 Parma, Italy
| |
Collapse
|
28
|
Commichaux S, Shah N, Ghurye J, Stoppel A, Goodheart JA, Luque GG, Cummings MP, Pop M. A critical assessment of gene catalogs for metagenomic analysis. Bioinformatics 2021; 37:2848-2857. [PMID: 33792639 PMCID: PMC8479683 DOI: 10.1093/bioinformatics/btab216] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Revised: 02/02/2021] [Accepted: 03/31/2021] [Indexed: 02/02/2023] Open
Abstract
MOTIVATION Microbial gene catalogs are data structures that organize genes found in microbial communities, providing a reference for standardized analysis of the microbes across samples and studies. Although gene catalogs are commonly used, they have not been critically evaluated for their effectiveness as a basis for metagenomic analyses. RESULTS As a case study, we investigate one such catalog, the Integrated Gene Catalog (IGC), however, our observations apply broadly to most gene catalogs constructed to date. We focus on both the approach used to construct this catalog and on its effectiveness when used as a reference for microbiome studies. Our results highlight important limitations of the approach used to construct the IGC and call into question the broad usefulness of gene catalogs more generally. We also recommend best practices for the construction and use of gene catalogs in microbiome studies and highlight opportunities for future research. AVAILABILITY AND IMPLEMENTATION All supporting scripts for our analyses can be found on GitHub: https://github.com/SethCommichaux/IGC.git. The supporting data can be downloaded from: https://obj.umiacs.umd.edu/igc-analysis/IGC_analysis_data.tar.gz. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Seth Commichaux
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, 20742, USA,Biological Science Graduate Program, University of Maryland, College Park, MD, 20742, USA,Division of Molecular Biology, Office of Applied Research and Safety Assessment, Center for Food Safety and Applied Nutrition, U.S. Food and Drug Administration, Laurel, Maryland, 20708, USA
| | - Nidhi Shah
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, 20742, USA,Department of Computer Science, University of Maryland, College Park, MD, 20742, USA
| | - Jay Ghurye
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, 20742, USA,Department of Computer Science, University of Maryland, College Park, MD, 20742, USA
| | - Alexander Stoppel
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, 20742, USA
| | - Jessica A Goodheart
- Scripps Institution of Oceanography, University of California, San Diego, La Jolla, CA, 92037, USA
| | - Guillermo G Luque
- Department of Microbiome Science, Max Planck Institute for Developmental Biology, Tübingen, 72076, Germany
| | - Michael P Cummings
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, 20742, USA
| | - Mihai Pop
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, 20742, USA,Department of Computer Science, University of Maryland, College Park, MD, 20742, USA,To whom correspondence should be addressed.
| |
Collapse
|
29
|
Abstract
The human microbiome encodes a second genome that dwarfs the genetic capacity of the host. Microbiota-derived small molecules can directly target human cells and their receptors or indirectly modulate host responses through functional interactions with other microbes in their ecological niche. Their biochemical complexity has profound implications for nutrition, immune system development, disease progression, and drug metabolism, as well as the variation in these processes that exists between individuals. While the species composition of the human microbiome has been deeply explored, detailed mechanistic studies linking specific microbial molecules to host phenotypes are still nascent. In this review, we discuss challenges in decoding these interaction networks, which require interdisciplinary approaches that combine chemical biology, microbiology, immunology, genetics, analytical chemistry, bioinformatics, and synthetic biology. We highlight important classes of microbiota-derived small molecules and notable examples. An understanding of these molecular mechanisms is central to realizing the potential of precision microbiome editing in health, disease, and therapeutic responses.
Collapse
Affiliation(s)
- Emilee E Shine
- Department of Microbial Pathogenesis, Yale University School of Medicine, New Haven, Connecticut 06536, USA; .,Chemical Biology Institute, Yale University, West Haven, Connecticut 06516, USA.,Current affiliation: Department of Molecular Biology, Princeton University, Princeton, New Jersey 08544, USA
| | - Jason M Crawford
- Department of Microbial Pathogenesis, Yale University School of Medicine, New Haven, Connecticut 06536, USA; .,Chemical Biology Institute, Yale University, West Haven, Connecticut 06516, USA.,Department of Chemistry, Yale University, New Haven, Connecticut 06520, USA
| |
Collapse
|
30
|
Bernstein DB, Sulheim S, Almaas E, Segrè D. Addressing uncertainty in genome-scale metabolic model reconstruction and analysis. Genome Biol 2021; 22:64. [PMID: 33602294 PMCID: PMC7890832 DOI: 10.1186/s13059-021-02289-z] [Citation(s) in RCA: 53] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Accepted: 02/04/2021] [Indexed: 02/07/2023] Open
Abstract
The reconstruction and analysis of genome-scale metabolic models constitutes a powerful systems biology approach, with applications ranging from basic understanding of genotype-phenotype mapping to solving biomedical and environmental problems. However, the biological insight obtained from these models is limited by multiple heterogeneous sources of uncertainty, which are often difficult to quantify. Here we review the major sources of uncertainty and survey existing approaches developed for representing and addressing them. A unified formal characterization of these uncertainties through probabilistic approaches and ensemble modeling will facilitate convergence towards consistent reconstruction pipelines, improved data integration algorithms, and more accurate assessment of predictive capacity.
Collapse
Affiliation(s)
- David B Bernstein
- Department of Biomedical Engineering and Biological Design Center, Boston University, Boston, MA, USA
| | - Snorre Sulheim
- Bioinformatics Program, Boston University, Boston, MA, USA
- Department of Biotechnology and Food Science, NTNU - Norwegian University of Science and Technology, Trondheim, Norway
- Department of Biotechnology and Nanomedicine, SINTEF Industry, Trondheim, Norway
| | - Eivind Almaas
- Department of Biotechnology and Food Science, NTNU - Norwegian University of Science and Technology, Trondheim, Norway
- K.G. Jebsen Center for Genetic Epidemiology, NTNU - Norwegian University of Science and Technology, Trondheim, Norway
| | - Daniel Segrè
- Department of Biomedical Engineering and Biological Design Center, Boston University, Boston, MA, USA.
- Bioinformatics Program, Boston University, Boston, MA, USA.
- Department of Biology and Department of Physics, Boston University, Boston, MA, USA.
| |
Collapse
|
31
|
Gu S, Milenković T. Data-driven biological network alignment that uses topological, sequence, and functional information. BMC Bioinformatics 2021; 22:34. [PMID: 33514304 PMCID: PMC7847157 DOI: 10.1186/s12859-021-03971-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Accepted: 01/15/2021] [Indexed: 11/15/2022] Open
Abstract
BACKGROUND Network alignment (NA) can transfer functional knowledge between species' conserved biological network regions. Traditional NA assumes that it is topological similarity (isomorphic-like matching) between network regions that corresponds to the regions' functional relatedness. However, we recently found that functionally unrelated proteins are as topologically similar as functionally related proteins. So, we redefined NA as a data-driven method called TARA, which learns from network and protein functional data what kind of topological relatedness (rather than similarity) between proteins corresponds to their functional relatedness. TARA used topological information (within each network) but not sequence information (between proteins across networks). Yet, TARA yielded higher protein functional prediction accuracy than existing NA methods, even those that used both topological and sequence information. RESULTS Here, we propose TARA++ that is also data-driven, like TARA and unlike other existing methods, but that uses across-network sequence information on top of within-network topological information, unlike TARA. To deal with the within-and-across-network analysis, we adapt social network embedding to the problem of biological NA. TARA++ outperforms protein functional prediction accuracy of existing methods. CONCLUSIONS As such, combining research knowledge from different domains is promising. Overall, improvements in protein functional prediction have biomedical implications, for example allowing researchers to better understand how cancer progresses or how humans age.
Collapse
Affiliation(s)
- Shawn Gu
- Department of Computer Science and Engineering, Eck Institute for Global Health, Center for Network and Data Science, University of Notre Dame, Notre Dame, IN, 46556, USA
| | - Tijana Milenković
- Department of Computer Science and Engineering, Eck Institute for Global Health, Center for Network and Data Science, University of Notre Dame, Notre Dame, IN, 46556, USA.
| |
Collapse
|
32
|
Stack TMM, Gerlt JA. Discovery of novel pathways for carbohydrate metabolism. Curr Opin Chem Biol 2020; 61:63-70. [PMID: 33197748 DOI: 10.1016/j.cbpa.2020.09.005] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Revised: 09/14/2020] [Accepted: 09/16/2020] [Indexed: 01/09/2023]
Abstract
Closing the gap between the increasing availability of complete genome sequences and the discovery of novel enzymes in novel metabolic pathways is a significant challenge. Here, we review recent examples of assignment of in vitro enzymatic activities and in vivo metabolic functions to uncharacterized proteins, with a focus on enzymes and metabolic pathways involved in the catabolism and biosynthesis of monosaccharides and polysaccharides. The most effective approaches are based on analyses of sequence-function space in protein families that provide clues for the predictions of the functions of the uncharacterized enzymes. As summarized in this Opinion, this approach allows the discovery of the catabolism of new molecules, new pathways for common molecules, and new enzymatic chemistries.
Collapse
Affiliation(s)
- Tyler M M Stack
- Carl. R. Woese Institute for Genomic Biology, University of Illinois, Urbana, IL 61801, United States
| | - John A Gerlt
- Carl. R. Woese Institute for Genomic Biology, University of Illinois, Urbana, IL 61801, United States; Departments of Biochemistry and Chemistry, University of Illinois, Urbana, IL 61801, United States.
| |
Collapse
|
33
|
Nedoluzhko A, Gruzdeva N, Sharko F, Rastorguev S, Zakharova N, Kostyuk G, Ushakov V. The Biomarker and Therapeutic Potential of Circular Rnas in Schizophrenia. Cells 2020; 9:E2238. [PMID: 33020462 PMCID: PMC7601372 DOI: 10.3390/cells9102238] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2020] [Revised: 09/29/2020] [Accepted: 10/01/2020] [Indexed: 12/14/2022] Open
Abstract
Circular RNAs (circRNAs) are endogenous, single-stranded, most frequently non-coding RNA (ncRNA) molecules that play a significant role in gene expression regulation. Circular RNAs can affect microRNA functionality, interact with RNA-binding proteins (RBPs), translate proteins by themselves, and directly or indirectly modulate gene expression during different cellular processes. The affected expression of circRNAs, as well as their targets, can trigger a cascade of events in the genetic regulatory network causing pathological conditions. Recent studies have shown that altered circular RNA expression patterns could be used as biomarkers in psychiatric diseases, including schizophrenia (SZ); moreover, circular RNAs together with other cell molecules could provide new insight into mechanisms of this disorder. In this review, we focus on the role of circular RNAs in the pathogenesis of SZ and analyze their biomarker and therapeutic potential in this disorder.
Collapse
Affiliation(s)
- Artem Nedoluzhko
- Faculty of Biosciences and Aquaculture, Nord University, PB 1490. 8049 Bodø, Norway
- Mental-Health Clinic No. 1 Named after N.A. Alexeev, Moscow Healthcare Department, Zagorodnoye Highway, 2, 115191 Moscow, Russia; (N.Z.); (G.K.); (V.U.)
| | - Natalia Gruzdeva
- National Research Center “Kurchatov Institute”, 1st Akademika Kurchatova Square, 123182 Moscow, Russia; (N.G.); (F.S.); (S.R.)
| | - Fedor Sharko
- National Research Center “Kurchatov Institute”, 1st Akademika Kurchatova Square, 123182 Moscow, Russia; (N.G.); (F.S.); (S.R.)
- Research Center of Biotechnology of the Russian Academy of Sciences, Leninsky prospect 33/2, 119071 Moscow, Russia
| | - Sergey Rastorguev
- National Research Center “Kurchatov Institute”, 1st Akademika Kurchatova Square, 123182 Moscow, Russia; (N.G.); (F.S.); (S.R.)
| | - Natalia Zakharova
- Mental-Health Clinic No. 1 Named after N.A. Alexeev, Moscow Healthcare Department, Zagorodnoye Highway, 2, 115191 Moscow, Russia; (N.Z.); (G.K.); (V.U.)
| | - Georgy Kostyuk
- Mental-Health Clinic No. 1 Named after N.A. Alexeev, Moscow Healthcare Department, Zagorodnoye Highway, 2, 115191 Moscow, Russia; (N.Z.); (G.K.); (V.U.)
| | - Vadim Ushakov
- Mental-Health Clinic No. 1 Named after N.A. Alexeev, Moscow Healthcare Department, Zagorodnoye Highway, 2, 115191 Moscow, Russia; (N.Z.); (G.K.); (V.U.)
- Institute for Advanced Brain Studies, Lomonosov Moscow State University, Leninskiye Gory, 119899 Moscow, Russia
| |
Collapse
|
34
|
Pyridoxamine-phosphate oxidases and pyridoxamine-phosphate oxidase-related proteins catalyze the oxidation of 6-NAD(P)H to NAD(P). Biochem J 2020; 476:3033-3052. [PMID: 31657440 DOI: 10.1042/bcj20190602] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2019] [Revised: 09/30/2019] [Accepted: 10/03/2019] [Indexed: 11/17/2022]
Abstract
6-NADH and 6-NADPH are strong inhibitors of several dehydrogenases that may form spontaneously from NAD(P)H. They are known to be oxidized to NAD(P)+ by mammalian renalase, an FAD-linked enzyme mainly present in heart and kidney, and by related bacterial enzymes. We partially purified an enzyme oxidizing 6-NADPH from rat liver, and, surprisingly, identified it as pyridoxamine-phosphate oxidase (PNPO). This was confirmed by the finding that recombinant mouse PNPO oxidized 6-NADH and 6-NADPH with catalytic efficiencies comparable to those observed with pyridoxine- and pyridoxamine-5'-phosphate. PNPOs from Escherichia coli, Saccharomyces cerevisiae and Arabidopsis thaliana also displayed 6-NAD(P)H oxidase activity, indicating that this 'side-activity' is conserved. Remarkably, 'pyridoxamine-phosphate oxidase-related proteins' (PNPO-RP) from Nostoc punctiforme, A. thaliana and the yeast S. cerevisiae (Ygr017w) were not detectably active on pyridox(am)ine-5'-P, but oxidized 6-NADH, 6-NADPH and 2-NADH suggesting that this may be their main catalytic function. Their specificity profiles were therefore similar to that of renalase. Inactivation of renalase and of PNPO in mammalian cells and of Ygr017w in yeasts led to the accumulation of a reduced form of 6-NADH, tentatively identified as 4,5,6-NADH3, which can also be produced in vitro by reduction of 6-NADH by glyceraldehyde-3-phosphate dehydrogenase or glucose-6-phosphate dehydrogenase. As 4,5,6-NADH3 is not a substrate for renalase, PNPO or PNPO-RP, its accumulation presumably reflects the block in the oxidation of 6-NADH. These findings indicate that two different classes of enzymes using either FAD (renalase) or FMN (PNPOs and PNPO-RPs) as a cofactor play an as yet unsuspected role in removing damaged forms of NAD(P).
Collapse
|
35
|
Clark TJ, Guo L, Morgan J, Schwender J. Modeling Plant Metabolism: From Network Reconstruction to Mechanistic Models. ANNUAL REVIEW OF PLANT BIOLOGY 2020; 71:303-326. [PMID: 32017600 DOI: 10.1146/annurev-arplant-050718-100221] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Mathematical modeling of plant metabolism enables the plant science community to understand the organization of plant metabolism, obtain quantitative insights into metabolic functions, and derive engineering strategies for manipulation of metabolism. Among the various modeling approaches, metabolic pathway analysis can dissect the basic functional modes of subsections of core metabolism, such as photorespiration, and reveal how classical definitions of metabolic pathways have overlapping functionality. In the many studies using constraint-based modeling in plants, numerous computational tools are currently available to analyze large-scale and genome-scale metabolic networks. For 13C-metabolic flux analysis, principles of isotopic steady state have been used to study heterotrophic plant tissues, while nonstationary isotope labeling approaches are amenable to the study of photoautotrophic and secondary metabolism. Enzyme kinetic models explore pathways in mechanistic detail, and we discuss different approaches to determine or estimate kinetic parameters. In this review, we describe recent advances and challenges in modeling plant metabolism.
Collapse
Affiliation(s)
- Teresa J Clark
- Biology Department, Brookhaven National Laboratory, Upton, New York 11973, USA; ,
| | - Longyun Guo
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana 47907, USA; ,
| | - John Morgan
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana 47907, USA; ,
| | - Jorg Schwender
- Biology Department, Brookhaven National Laboratory, Upton, New York 11973, USA; ,
| |
Collapse
|
36
|
Aspartate aminotransferase Rv3722c governs aspartate-dependent nitrogen metabolism in Mycobacterium tuberculosis. Nat Commun 2020; 11:1960. [PMID: 32327655 PMCID: PMC7181641 DOI: 10.1038/s41467-020-15876-8] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2019] [Accepted: 03/31/2020] [Indexed: 01/01/2023] Open
Abstract
Gene rv3722c of Mycobacterium tuberculosis is essential for in vitro growth, and encodes a putative pyridoxal phosphate-binding protein of unknown function. Here we use metabolomic, genetic and structural approaches to show that Rv3722c is the primary aspartate aminotransferase of M. tuberculosis, and mediates an essential but underrecognized role in metabolism: nitrogen distribution. Rv3722c deficiency leads to virulence attenuation in macrophages and mice. Our results identify aspartate biosynthesis and nitrogen distribution as potential species-selective drug targets in M. tuberculosis.
Collapse
|
37
|
Sung AY, Floyd BJ, Pagliarini DJ. Systems Biochemistry Approaches to Defining Mitochondrial Protein Function. Cell Metab 2020; 31:669-678. [PMID: 32268114 PMCID: PMC7176052 DOI: 10.1016/j.cmet.2020.03.011] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/19/2019] [Revised: 03/06/2020] [Accepted: 03/13/2020] [Indexed: 02/07/2023]
Abstract
Defining functions for the full complement of proteins is a grand challenge in the post-genomic era and is essential for our understanding of basic biology and disease pathogenesis. In recent times, this endeavor has benefitted from a combination of modern large-scale and classical reductionist approaches-a process we refer to as "systems biochemistry"-that helps surmount traditional barriers to the characterization of poorly understood proteins. This strategy is proving to be particularly effective for mitochondria, whose well-defined proteome has enabled comprehensive analyses of the full mitochondrial system that can position understudied proteins for fruitful mechanistic investigations. Recent systems biochemistry approaches have accelerated the identification of new disease-related mitochondrial proteins and of long-sought "missing" proteins that fulfill key functions. Collectively, these studies are moving us toward a more complete understanding of mitochondrial activities and providing a molecular framework for the investigation of mitochondrial pathogenesis.
Collapse
Affiliation(s)
- Andrew Y Sung
- Morgridge Institute for Research, Madison, WI, USA; Department of Biochemistry, University of Wisconsin-Madison, Madison, WI, USA; School of Medicine and Public Health, University of Wisconsin-Madison, Madison, WI, USA
| | - Brendan J Floyd
- Morgridge Institute for Research, Madison, WI, USA; Department of Biochemistry, University of Wisconsin-Madison, Madison, WI, USA; Department of Pediatrics, Stanford School of Medicine, Stanford, CA, USA
| | - David J Pagliarini
- Morgridge Institute for Research, Madison, WI, USA; Department of Biochemistry, University of Wisconsin-Madison, Madison, WI, USA.
| |
Collapse
|
38
|
Tawfik DS, Gruic-Sovulj I. How evolution shapes enzyme selectivity - lessons from aminoacyl-tRNA synthetases and other amino acid utilizing enzymes. FEBS J 2020; 287:1284-1305. [PMID: 31891445 DOI: 10.1111/febs.15199] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2019] [Revised: 12/08/2019] [Accepted: 12/30/2019] [Indexed: 12/21/2022]
Abstract
Aminoacyl-tRNA synthetases (AARSs) charge tRNA with their cognate amino acids. Many other enzymes use amino acids as substrates, yet discrimination against noncognate amino acids that threaten the accuracy of protein translation is a hallmark of AARSs. Comparing AARSs to these other enzymes allowed us to recognize patterns in molecular recognition and strategies used by evolution for exercising selectivity. Overall, AARSs are 2-3 orders of magnitude more selective than most other amino acid utilizing enzymes. AARSs also reveal the physicochemical limits of molecular discrimination. For example, amino acids smaller by a single methyl moiety present a discrimination ceiling of ~200, while larger ones can be discriminated by up to 105 -fold. In contrast, substrates larger by a hydroxyl group challenge AARS selectivity, due to promiscuous H-bonding with polar active site groups. This 'hydroxyl paradox' is resolved by editing. Indeed, when the physicochemical discrimination limits are reached, post-transfer editing - hydrolysis of tRNAs charged with noncognate amino acids, evolved. The editing site often selectively recognizes the edited noncognate substrate using the very same feature that the synthetic site could not efficiently discriminate against. Finally, the comparison to other enzymes also reveals that the selectivity of AARSs is an explicitly evolved trait, showing some clear examples of how selection acted not only to optimize catalytic efficiency with the target substrate, but also to abolish activity with noncognate threat substrates ('negative selection').
Collapse
Affiliation(s)
- Dan S Tawfik
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot, Israel
| | - Ita Gruic-Sovulj
- Department of Chemistry, Faculty of Science, University of Zagreb, Croatia
| |
Collapse
|
39
|
Zallot R, Oberg N, Gerlt JA. The EFI Web Resource for Genomic Enzymology Tools: Leveraging Protein, Genome, and Metagenome Databases to Discover Novel Enzymes and Metabolic Pathways. Biochemistry 2019; 58:4169-4182. [PMID: 31553576 DOI: 10.1021/acs.biochem.9b00735] [Citation(s) in RCA: 395] [Impact Index Per Article: 79.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
Abstract
The assignment of functions to uncharacterized proteins discovered in genome projects requires easily accessible tools and computational resources for large-scale, user-friendly leveraging of the protein, genome, and metagenome databases by experimentalists. This article describes the web resource developed by the Enzyme Function Initiative (EFI; accessed at https://efi.igb.illinois.edu/ ) that provides "genomic enzymology" tools ("web tools") for (1) generating sequence similarity networks (SSNs) for protein families (EFI-EST); (2) analyzing and visualizing genome context of the proteins in clusters in SSNs (in genome neighborhood networks, GNNs, and genome neighborhood diagrams, GNDs) (EFI-GNT); and (3) prioritizing uncharacterized SSN clusters for functional assignment based on metagenome abundance (chemically guided functional profiling, CGFP) (EFI-CGFP). The SSNs generated by EFI-EST are used as the input for EFI-GNT and EFI-CGFP, enabling easy transfer of information among the tools. The networks are visualized and analyzed using Cytoscape, a widely used desktop application; GNDs and CGFP heatmaps summarizing metagenome abundance are viewed within the tools. We provide a detailed example of the integrated use of the tools with an analysis of glycyl radical enzyme superfamily (IPR004184) found in the human gut microbiome. This analysis demonstrates that (1) SwissProt annotations are not always correct, (2) large-scale genome context analyses allow the prediction of novel metabolic pathways, and (3) metagenome abundance can be used to identify/prioritize uncharacterized proteins for functional investigation.
Collapse
|
40
|
Metabolite Repair Enzymes Control Metabolic Damage in Glycolysis. Trends Biochem Sci 2019; 45:228-243. [PMID: 31473074 DOI: 10.1016/j.tibs.2019.07.004] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2019] [Revised: 07/19/2019] [Accepted: 07/31/2019] [Indexed: 12/29/2022]
Abstract
Hundreds of metabolic enzymes work together smoothly in a cell. These enzymes are highly specific. Nevertheless, under physiological conditions, many perform side-reactions at low rates, producing potentially toxic side-products. An increasing number of metabolite repair enzymes are being discovered that serve to eliminate these noncanonical metabolites. Some of these enzymes are extraordinarily conserved, and their deficiency can lead to diseases in humans or embryonic lethality in mice, indicating their central role in cellular metabolism. We discuss how metabolite repair enzymes eliminate glycolytic side-products and prevent negative interference within and beyond this core metabolic pathway. Extrapolating from the number of metabolite repair enzymes involved in glycolysis, hundreds more likely remain to be discovered that protect a wide range of metabolic pathways.
Collapse
|
41
|
Berezovsky IN. Towards descriptor of elementary functions for protein design. Curr Opin Struct Biol 2019; 58:159-165. [PMID: 31352188 DOI: 10.1016/j.sbi.2019.06.010] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2019] [Accepted: 06/18/2019] [Indexed: 11/18/2022]
Abstract
We review studies of the protein evolution that help to formulate rules for protein design. Acknowledging the fundamental importance of Dayhoff's provision on the emergence of functional proteins from short peptides, we discuss multiple evidences of the omnipresent partitioning of protein globules into structural/functional units, using which greatly facilitates the engineering and design efforts. Closed loops and elementary functional loops, which are descendants of ancient ring-like peptides that formed fist protein domains in agreement with Dayhoff's hypothesis, can be considered as basic units of protein structure and function. We argue that future developments in protein design approaches should consider descriptors of the elementary functions, which will help to complement designed scaffolds with functional signatures and flexibility necessary for their functions.
Collapse
Affiliation(s)
- Igor N Berezovsky
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A⁎STAR), 30 Biopolis Street, #07-01, Matrix 138671, Singapore; Department of Biological Sciences (DBS), National University of Singapore (NUS), 8 Medical Drive, 117579, Singapore.
| |
Collapse
|
42
|
Erb TJ. Back to the future: Why we need enzymology to build a synthetic metabolism of the future. Beilstein J Org Chem 2019; 15:551-557. [PMID: 30873239 PMCID: PMC6404388 DOI: 10.3762/bjoc.15.49] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2018] [Accepted: 01/29/2019] [Indexed: 12/26/2022] Open
Abstract
Biology is turning from an analytical into a synthetic discipline. This is especially apparent in the field of metabolic engineering, where the concept of synthetic metabolism has been recently developed. Compared to classical metabolic engineering efforts, synthetic metabolism aims at creating novel metabolic networks in a rational fashion from bottom-up. However, while the theoretical design of synthetic metabolic networks has made tremendous progress, the actual realization of such synthetic pathways is still lacking behind. This is mostly because of our limitations in enzyme discovery and engineering to provide the parts required to build synthetic metabolism. Here I discuss the current challenges and limitations in synthetic metabolic engineering and elucidate how modern day enzymology can help to build a synthetic metabolism of the future.
Collapse
Affiliation(s)
- Tobias J Erb
- Max-Planck-Institute for Terrestrial Microbiology, Department of Biochemistry & Synthetic Metabolism, Karl-von-Frisch-Str. 10, D-35043 Marburg, Germany.,LOEWE Center for Synthetic Microbiology (SYNMIKRO), Marburg, Germany
| |
Collapse
|
43
|
Abstract
Cell-free protein synthesis (CFPS) has become an established tool for rapid protein synthesis in order to accelerate the discovery of new enzymes and the development of proteins with improved characteristics. Over the past years, progress in CFPS system preparation has been made towards simplification, and many applications have been developed with regard to tailor-made solutions for specific purposes. In this review, various preparation methods of CFPS systems are compared and the significance of individual supplements is assessed. The recent applications of CFPS are summarized and the potential for biocatalyst development discussed. One of the central features is the high-throughput synthesis of protein variants, which enables sophisticated approaches for rapid prototyping of enzymes. These applications demonstrate the contribution of CFPS to enhance enzyme functionalities and the complementation to in vivo protein synthesis. However, there are different issues to be addressed, such as the low predictability of CFPS performance and transferability to in vivo protein synthesis. Nevertheless, the usage of CFPS for high-throughput enzyme screening has been proven to be an efficient method to discover novel biocatalysts and improved enzyme variants.
Collapse
|
44
|
Towards functional characterization of archaeal genomic dark matter. Biochem Soc Trans 2019; 47:389-398. [PMID: 30710061 PMCID: PMC6393860 DOI: 10.1042/bst20180560] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2018] [Revised: 01/08/2019] [Accepted: 01/09/2019] [Indexed: 01/07/2023]
Abstract
A substantial fraction of archaeal genes, from ∼30% to as much as 80%, encode ‘hypothetical' proteins or genomic ‘dark matter'. Archaeal genomes typically contain a higher fraction of dark matter compared with bacterial genomes, primarily, because isolation and cultivation of most archaea in the laboratory, and accordingly, experimental characterization of archaeal genes, are difficult. In the present study, we present quantitative characteristics of the archaeal genomic dark matter and discuss comparative genomic approaches for functional prediction for ‘hypothetical' proteins. We propose a list of top priority candidates for experimental characterization with a broad distribution among archaea and those that are characteristic of poorly studied major archaeal groups such as Thaumarchaea, DPANN (Diapherotrites, Parvarchaeota, Aenigmarchaeota, Nanoarchaeota and Nanohaloarchaeota) and Asgard.
Collapse
|
45
|
Trudeau DL, Tawfik DS. Protein engineers turned evolutionists-the quest for the optimal starting point. Curr Opin Biotechnol 2019; 60:46-52. [PMID: 30611116 DOI: 10.1016/j.copbio.2018.12.002] [Citation(s) in RCA: 73] [Impact Index Per Article: 14.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2018] [Revised: 11/22/2018] [Accepted: 12/03/2018] [Indexed: 12/12/2022]
Abstract
The advent of laboratory directed evolution yielded a fruitful crosstalk between the disciplines of molecular evolution and bio-engineering. Here, we outline recent developments in both disciplines with respect to how one can identify the best starting points for directed evolution, such that highly efficient and robust tailor-made enzymes can be obtained with minimal optimization. Directed evolution studies have highlighted essential features of engineer-able enzymes: highly stable, mutationally robust enzymes with the capacity to accept a broad range of substrates. Robust, evolvable enzymes can be inferred from the natural sequence record. Broad substrate spectrum relates to conformational plasticity and can also be predicted by phylogenetic analyses and/or by computational design. Overall, an increasingly powerful toolkit is becoming available for identifying optimal starting points including network analyses of enzyme superfamilies and other bioinformatics methods.
Collapse
Affiliation(s)
- Devin L Trudeau
- Department of Biomolecular Sciences, Weizmann Institute of Science, 234 Herzl Street, Rehovot 7610001, Israel
| | - Dan S Tawfik
- Department of Biomolecular Sciences, Weizmann Institute of Science, 234 Herzl Street, Rehovot 7610001, Israel.
| |
Collapse
|
46
|
Rinschen MM, Limbutara K, Knepper MA, Payne DM, Pisitkun T. From Molecules to Mechanisms: Functional Proteomics and Its Application to Renal Tubule Physiology. Physiol Rev 2019; 98:2571-2606. [PMID: 30182799 DOI: 10.1152/physrev.00057.2017] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
Classical physiological studies using electrophysiological, biophysical, biochemical, and molecular techniques have created a detailed picture of molecular transport, bioenergetics, contractility and movement, and growth, as well as the regulation of these processes by external stimuli in cells and organisms. Newer systems biology approaches are beginning to provide deeper and broader understanding of these complex biological processes and their dynamic responses to a variety of environmental cues. In the past decade, advances in mass spectrometry-based proteomic technologies have provided invaluable tools to further elucidate these complex cellular processes, thereby confirming, complementing, and advancing common views of physiology. As one notable example, the application of proteomics to study the regulation of kidney function has yielded novel insights into the chemical and physical processes that tightly control body fluids, electrolytes, and metabolites to provide optimal microenvironments for various cellular and organ functions. Here, we systematically review, summarize, and discuss the most significant key findings from functional proteomic studies in renal epithelial physiology. We also identify further improvements in technological and bioinformatics methods that will be essential to advance precision medicine in nephrology.
Collapse
Affiliation(s)
- Markus M Rinschen
- Department II of Internal Medicine, University Hospital Cologne , Cologne , Germany ; Center for Molecular Medicine Cologne, University of Cologne , Cologne , Germany ; Cologne Excellence Cluster on Cellular Stress Responses in Aging-Associated Diseases, University of Cologne , Cologne , Germany ; Division of Nephrology, Faculty of Medicine, Chulalongkorn University , Bangkok , Thailand ; Epithelial Systems Biology Laboratory, National Heart, Lung, and Blood Institute, National Institutes of Health , Bethesda, Maryland ; and Center of Excellence in Systems Biology, Research Affairs, Faculty of Medicine, Chulalongkorn University , Bangkok , Thailand
| | - Kavee Limbutara
- Department II of Internal Medicine, University Hospital Cologne , Cologne , Germany ; Center for Molecular Medicine Cologne, University of Cologne , Cologne , Germany ; Cologne Excellence Cluster on Cellular Stress Responses in Aging-Associated Diseases, University of Cologne , Cologne , Germany ; Division of Nephrology, Faculty of Medicine, Chulalongkorn University , Bangkok , Thailand ; Epithelial Systems Biology Laboratory, National Heart, Lung, and Blood Institute, National Institutes of Health , Bethesda, Maryland ; and Center of Excellence in Systems Biology, Research Affairs, Faculty of Medicine, Chulalongkorn University , Bangkok , Thailand
| | - Mark A Knepper
- Department II of Internal Medicine, University Hospital Cologne , Cologne , Germany ; Center for Molecular Medicine Cologne, University of Cologne , Cologne , Germany ; Cologne Excellence Cluster on Cellular Stress Responses in Aging-Associated Diseases, University of Cologne , Cologne , Germany ; Division of Nephrology, Faculty of Medicine, Chulalongkorn University , Bangkok , Thailand ; Epithelial Systems Biology Laboratory, National Heart, Lung, and Blood Institute, National Institutes of Health , Bethesda, Maryland ; and Center of Excellence in Systems Biology, Research Affairs, Faculty of Medicine, Chulalongkorn University , Bangkok , Thailand
| | - D Michael Payne
- Department II of Internal Medicine, University Hospital Cologne , Cologne , Germany ; Center for Molecular Medicine Cologne, University of Cologne , Cologne , Germany ; Cologne Excellence Cluster on Cellular Stress Responses in Aging-Associated Diseases, University of Cologne , Cologne , Germany ; Division of Nephrology, Faculty of Medicine, Chulalongkorn University , Bangkok , Thailand ; Epithelial Systems Biology Laboratory, National Heart, Lung, and Blood Institute, National Institutes of Health , Bethesda, Maryland ; and Center of Excellence in Systems Biology, Research Affairs, Faculty of Medicine, Chulalongkorn University , Bangkok , Thailand
| | - Trairak Pisitkun
- Department II of Internal Medicine, University Hospital Cologne , Cologne , Germany ; Center for Molecular Medicine Cologne, University of Cologne , Cologne , Germany ; Cologne Excellence Cluster on Cellular Stress Responses in Aging-Associated Diseases, University of Cologne , Cologne , Germany ; Division of Nephrology, Faculty of Medicine, Chulalongkorn University , Bangkok , Thailand ; Epithelial Systems Biology Laboratory, National Heart, Lung, and Blood Institute, National Institutes of Health , Bethesda, Maryland ; and Center of Excellence in Systems Biology, Research Affairs, Faculty of Medicine, Chulalongkorn University , Bangkok , Thailand
| |
Collapse
|
47
|
Griesemer M, Kimbrel JA, Zhou CE, Navid A, D'haeseleer P. Combining multiple functional annotation tools increases coverage of metabolic annotation. BMC Genomics 2018; 19:948. [PMID: 30567498 PMCID: PMC6299973 DOI: 10.1186/s12864-018-5221-9] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2018] [Accepted: 11/05/2018] [Indexed: 12/15/2022] Open
Abstract
Background Genome-scale metabolic modeling is a cornerstone of systems biology analysis of microbial organisms and communities, yet these genome-scale modeling efforts are invariably based on incomplete functional annotations. Annotated genomes typically contain 30–50% of genes without functional annotation, severely limiting our knowledge of the “parts lists” that the organisms have at their disposal. These incomplete annotations may be sufficient to derive a model of a core set of well-studied metabolic pathways that support growth in pure culture. However, pathways important for growth on unusual metabolites exchanged in complex microbial communities are often less understood, resulting in missing functional annotations in newly sequenced genomes. Results Here, we present results on a comprehensive reannotation of 27 bacterial reference genomes, focusing on enzymes with EC numbers annotated by KEGG, RAST, EFICAz, and the BRENDA enzyme database, and on membrane transport annotations by TransportDB, KEGG and RAST. Our analysis shows that annotation using multiple tools can result in a drastically larger metabolic network reconstruction, adding on average 40% more EC numbers, 3–8 times more substrate-specific transporters, and 37% more metabolic genes. These results are even more pronounced for bacterial species that are phylogenetically distant from well-studied model organisms such as E. coli. Conclusions Metabolic annotations are often incomplete and inconsistent. Combining multiple functional annotation tools can greatly improve genome coverage and metabolic network size, especially for non-model organisms and non-core pathways. Electronic supplementary material The online version of this article (10.1186/s12864-018-5221-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Marc Griesemer
- Biosciences and Biotechnology Division, Lawrence Livermore National Laboratory, Livermore, CA, 94551, USA
| | - Jeffrey A Kimbrel
- Biosciences and Biotechnology Division, Lawrence Livermore National Laboratory, Livermore, CA, 94551, USA
| | - Carol E Zhou
- Global Security Computing Applications Division, Lawrence Livermore National Laboratory, Livermore, CA, 94551, USA
| | - Ali Navid
- Biosciences and Biotechnology Division, Lawrence Livermore National Laboratory, Livermore, CA, 94551, USA
| | - Patrik D'haeseleer
- Biosciences and Biotechnology Division, Lawrence Livermore National Laboratory, Livermore, CA, 94551, USA. .,Global Security Computing Applications Division, Lawrence Livermore National Laboratory, Livermore, CA, 94551, USA.
| |
Collapse
|
48
|
Harrison PM. Compositionally Biased Dark Matter in the Protein Universe. Proteomics 2018; 18:e1800069. [PMID: 30260558 DOI: 10.1002/pmic.201800069] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2018] [Revised: 08/29/2018] [Indexed: 01/01/2023]
Abstract
Compositionally biased regions (BRs) occur when a few amino-acid types are enriched in a protein segment. There are possibly BR types in the known protein universe that have not been characterized experimentally. The UniProt protein database has been surveyed for evidence of such compositionally ''dark matter''. A ''dark biased region'' (DBR) is defined as a biased region with low probability of being an individual structural domain or intrinsically disordered region. The bias annotation program fLPS is used to generate a list of >13 million BRs, which is then thoroughly filtered for structure and intrinsic disorder. About a third of BRs (31%) has both substantial intrinsic disorder and structure. After filtering, there are ≈0.9 million DBRs (≈7% of the original BRs in ≈1.4% of proteins). These DBRs are hugely enriched in eukaryotes and hugely depleted in bacteria. They tend to be more hydrophobic than other protein regions, but are made of less extreme combinations of hydrophobic/hydrophilic residues. Given varying assumptions, It has been estimated that how many DBRs there might be for the high bias levels examined (with p-values < 1 × 10-06 ), deriving a reasonable range of 0.7-7.2% of proteins having such DBRs. Hypotheses are examined about what such DBRs might be, that is, that they are from un- or undersampled domain/region categories or are unappreciated categories somewhat like existing ones.
Collapse
Affiliation(s)
- Paul M Harrison
- Department of Biology, McGill University, Montreal, QC, H3A 1B1, Canada
| |
Collapse
|
49
|
Xu YF, Lu W, Chen JC, Johnson SA, Gibney PA, Thomas DG, Brown G, May AL, Campagna SR, Yakunin AF, Botstein D, Rabinowitz JD. Discovery and Functional Characterization of a Yeast Sugar Alcohol Phosphatase. ACS Chem Biol 2018; 13:3011-3020. [PMID: 30240188 DOI: 10.1021/acschembio.8b00804] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Sugar alcohols (polyols) exist widely in nature. While some specific sugar alcohol phosphatases are known, there is no known phosphatase for some important sugar alcohols (e.g., sorbitol-6-phosphate). Using liquid chromatography-mass spectrometry-based metabolomics, we screened yeast strains with putative phosphatases of unknown function deleted. We show that the yeast gene YNL010W, which has close homologues in all fungi species and some plants, encodes a sugar alcohol phosphatase. We term this enzyme, which hydrolyzes sorbitol-6-phosphate, ribitol-5-phosphate, and (d)-glycerol-3-phosphate, polyol phosphatase 1 or PYP1. Polyol phosphates are structural analogs of the enediol intermediate of phosphoglucose isomerase (Pgi). We find that sorbitol-6-phosphate and ribitol-5-phosphate inhibit Pgi and that Pyp1 activity is important for yeast to maintain Pgi activity in the presence of environmental sugar alcohols. Pyp1 expression is strongly positively correlated with yeast growth rate, presumably because faster growth requires greater glycolytic and accordingly Pgi flux. Thus, yeast express the previously uncharacterized enzyme Pyp1 to prevent inhibition of glycolysis by sugar alcohol phosphates. Pyp1 may be useful for engineering sugar alcohol production.
Collapse
Affiliation(s)
- Yi-Fan Xu
- Lewis Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08544, United States
- Department of Chemistry, Princeton University, Princeton, New Jersey 08544, United States
| | - Wenyun Lu
- Lewis Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08544, United States
| | - Jonathan C. Chen
- Lewis Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08544, United States
- Department of Chemistry, Princeton University, Princeton, New Jersey 08544, United States
| | - Sarah A. Johnson
- Lewis Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08544, United States
- Department of Chemistry, Princeton University, Princeton, New Jersey 08544, United States
| | - Patrick A. Gibney
- Lewis Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08544, United States
| | - David G. Thomas
- Lewis Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08544, United States
| | - Greg Brown
- Department of Chemical Engineering and Applied Chemistry, Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario M5S 3E5, Canada
| | - Amanda L. May
- Department of Chemistry, University of Tennessee, Knoxville, Tennessee 37996, United States
| | - Shawn R. Campagna
- Department of Chemistry, University of Tennessee, Knoxville, Tennessee 37996, United States
| | - Alexander F. Yakunin
- Department of Chemical Engineering and Applied Chemistry, Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario M5S 3E5, Canada
| | - David Botstein
- Lewis Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08544, United States
- Department of Molecular Biology, Princeton University, Princeton, New Jersey 08544, United States
| | - Joshua D. Rabinowitz
- Lewis Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08544, United States
- Department of Chemistry, Princeton University, Princeton, New Jersey 08544, United States
| |
Collapse
|
50
|
Wyman SK, Avila-Herrera A, Nayfach S, Pollard KS. A most wanted list of conserved microbial protein families with no known domains. PLoS One 2018; 13:e0205749. [PMID: 30332487 PMCID: PMC6192648 DOI: 10.1371/journal.pone.0205749] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2018] [Accepted: 10/01/2018] [Indexed: 02/07/2023] Open
Abstract
The number and proportion of genes with no known function are growing rapidly. To quantify this phenomenon and provide criteria for prioritizing genes for functional characterization, we developed a bioinformatics pipeline that identifies robustly defined protein families with no annotated domains, ranks these with respect to phylogenetic breadth, and identifies them in metagenomics data. We applied this approach to 271 965 protein families from the SFams database and discovered many with no functional annotation, including >118 000 families lacking any known protein domain. From these, we prioritized 6 668 conserved protein families with at least three sequences from organisms in at least two distinct classes. These Function Unknown Families (FUnkFams) are present in Tara Oceans Expedition and Human Microbiome Project metagenomes, with distributions associated with sampling environment. Our findings highlight the extent of functional novelty in sequence databases and establish an approach for creating a "most wanted" list of genes to prioritize for further characterization.
Collapse
Affiliation(s)
- Stacia K. Wyman
- Gladstone Institutes, San Francisco, CA, United States of America
- University of California, Berkeley, CA, United States of America
| | - Aram Avila-Herrera
- Gladstone Institutes, San Francisco, CA, United States of America
- Lawrence Livermore National Laboratory, Livermore, CA, United States of America
| | - Stephen Nayfach
- Gladstone Institutes, San Francisco, CA, United States of America
- University of California, San Francisco, CA, United States of America
- DOE Joint Genome Institute, Walnut Creek, CA, United States of America
| | - Katherine S. Pollard
- Gladstone Institutes, San Francisco, CA, United States of America
- University of California, San Francisco, CA, United States of America
- Chan-Zuckerberg Biohub, San Francisco, CA, United States of America
- * E-mail:
| |
Collapse
|