1
|
Knoshaug EP, Sun P, Nag A, Nguyen H, Mattoon EM, Zhang N, Liu J, Chen C, Cheng J, Zhang R, St. John P, Umen J. Identification and preliminary characterization of conserved uncharacterized proteins from Chlamydomonas reinhardtii, Arabidopsis thaliana, and Setaria viridis. PLANT DIRECT 2023; 7:e527. [PMID: 38044962 PMCID: PMC10690477 DOI: 10.1002/pld3.527] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 08/03/2023] [Accepted: 08/11/2023] [Indexed: 12/05/2023]
Abstract
The rapid accumulation of sequenced plant genomes in the past decade has outpaced the still difficult problem of genome-wide protein-coding gene annotation. A substantial fraction of protein-coding genes in all plant genomes are poorly annotated or unannotated and remain functionally uncharacterized. We identified unannotated proteins in three model organisms representing distinct branches of the green lineage (Viridiplantae): Arabidopsis thaliana (eudicot), Setaria viridis (monocot), and Chlamydomonas reinhardtii (Chlorophyte alga). Using similarity searching, we identified a subset of unannotated proteins that were conserved between these species and defined them as Deep Green proteins. Bioinformatic, genomic, and structural predictions were performed to begin classifying Deep Green genes and proteins. Compared to whole proteomes for each species, the Deep Green set was enriched for proteins with predicted chloroplast targeting signals predictive of photosynthetic or plastid functions, a result that was consistent with enrichment for daylight phase diurnal expression patterning. Structural predictions using AlphaFold and comparisons to known structures showed that a significant proportion of Deep Green proteins may possess novel folds. Though only available for three organisms, the Deep Green genes and proteins provide a starting resource of high-value targets for further investigation of potentially new protein structures and functions conserved across the green lineage.
Collapse
Affiliation(s)
- Eric P. Knoshaug
- Biosciences CenterNational Renewable Energy LaboratoryGoldenColoradoUSA
| | - Peipei Sun
- Donald Danforth Plant Science CenterSt. LouisMOUSA
| | - Ambarish Nag
- Computational Sciences CenterNational Renewable Energy LaboratoryGoldenColoradoUSA
| | - Huong Nguyen
- Donald Danforth Plant Science CenterSt. LouisMOUSA
- Institute of Genomics for Crop Abiotic Stress Tolerance, Department of Plant and Soil ScienceTexas Tech UniversityLubbockTexasUSA
| | - Erin M. Mattoon
- Donald Danforth Plant Science CenterSt. LouisMOUSA
- Plant and Microbial Biosciences Program, Division of Biology and Biomedical SciencesWashington University in Saint LouisSt. LouisMissouriUSA
| | | | - Jian Liu
- Department of Electrical Engineering and Computer ScienceUniversity of MissouriColumbiaMissouriUSA
| | - Chen Chen
- Department of Electrical Engineering and Computer ScienceUniversity of MissouriColumbiaMissouriUSA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer ScienceUniversity of MissouriColumbiaMissouriUSA
| | - Ru Zhang
- Donald Danforth Plant Science CenterSt. LouisMOUSA
| | - Peter St. John
- Biosciences CenterNational Renewable Energy LaboratoryGoldenColoradoUSA
| | - James Umen
- Donald Danforth Plant Science CenterSt. LouisMOUSA
| |
Collapse
|
2
|
de Crécy-lagard V, Amorin de Hegedus R, Arighi C, Babor J, Bateman A, Blaby I, Blaby-Haas C, Bridge AJ, Burley SK, Cleveland S, Colwell LJ, Conesa A, Dallago C, Danchin A, de Waard A, Deutschbauer A, Dias R, Ding Y, Fang G, Friedberg I, Gerlt J, Goldford J, Gorelik M, Gyori BM, Henry C, Hutinet G, Jaroch M, Karp PD, Kondratova L, Lu Z, Marchler-Bauer A, Martin MJ, McWhite C, Moghe GD, Monaghan P, Morgat A, Mungall CJ, Natale DA, Nelson WC, O’Donoghue S, Orengo C, O’Toole KH, Radivojac P, Reed C, Roberts RJ, Rodionov D, Rodionova IA, Rudolf JD, Saleh L, Sheynkman G, Thibaud-Nissen F, Thomas PD, Uetz P, Vallenet D, Carter EW, Weigele PR, Wood V, Wood-Charlson EM, Xu J. A roadmap for the functional annotation of protein families: a community perspective. Database (Oxford) 2022; 2022:6663924. [PMID: 35961013 PMCID: PMC9374478 DOI: 10.1093/database/baac062] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 06/28/2022] [Accepted: 08/03/2022] [Indexed: 12/23/2022]
Abstract
Over the last 25 years, biology has entered the genomic era and is becoming a science of ‘big data’. Most interpretations of genomic analyses rely on accurate functional annotations of the proteins encoded by more than 500 000 genomes sequenced to date. By different estimates, only half the predicted sequenced proteins carry an accurate functional annotation, and this percentage varies drastically between different organismal lineages. Such a large gap in knowledge hampers all aspects of biological enterprise and, thereby, is standing in the way of genomic biology reaching its full potential. A brainstorming meeting to address this issue funded by the National Science Foundation was held during 3–4 February 2022. Bringing together data scientists, biocurators, computational biologists and experimentalists within the same venue allowed for a comprehensive assessment of the current state of functional annotations of protein families. Further, major issues that were obstructing the field were identified and discussed, which ultimately allowed for the proposal of solutions on how to move forward.
Collapse
Affiliation(s)
- Valérie de Crécy-lagard
- Department of Microbiology and Cell Sciences, University of Florida , Gainesville, FL 32611, USA
| | | | - Cecilia Arighi
- Department of Computer and Information Sciences, University of Delaware , Newark, DE 19713, USA
| | - Jill Babor
- Department of Microbiology and Cell Sciences, University of Florida , Gainesville, FL 32611, USA
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus , Hinxton CB10 1SD, UK
| | - Ian Blaby
- US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory , Berkeley, CA 94720, USA
| | - Crysten Blaby-Haas
- Biology Department, Brookhaven National Laboratory , Upton, NY 11973, USA
| | - Alan J Bridge
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire , Geneva 4 CH-1211, Switzerland
| | - Stephen K Burley
- RCSB Protein Data Bank, Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey , Piscataway, NJ 08854, USA
| | - Stacey Cleveland
- Department of Microbiology and Cell Sciences, University of Florida , Gainesville, FL 32611, USA
| | - Lucy J Colwell
- Departmenf of Chemistry, University of Cambridge , Lensfield Road, Cambridge CB2 1EW, UK
| | - Ana Conesa
- Spanish National Research Council, Institute for Integrative Systems Biology , Paterna, Valencia 46980, Spain
| | - Christian Dallago
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology , i12, Boltzmannstr. 3, Garching/Munich 85748, Germany
| | - Antoine Danchin
- School of Biomedical Sciences, Li KaShing Faculty of Medicine, The University of Hong Kong , 21 Sassoon Road, Pokfulam, SAR Hong Kong 999077, China
| | - Anita de Waard
- Research Collaboration Unit, Elsevier , Jericho, VT 05465, USA
| | - Adam Deutschbauer
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory , Berkeley, CA 94720, USA
| | - Raquel Dias
- Department of Microbiology and Cell Sciences, University of Florida , Gainesville, FL 32611, USA
| | - Yousong Ding
- Department of Medicinal Chemistry, Center for Natural Products, Drug Discovery and Development, University of Florida , Gainesville, FL 32610, USA
| | - Gang Fang
- NYU-Shanghai , Shanghai 200120, China
| | - Iddo Friedberg
- Department of Veterinary Microbiology and Preventive Medicine, Iowa State University , Ames, IA 50011, USA
| | - John Gerlt
- Institute for Genomic Biology and Departments of Biochemistry and Chemistry, University of Illinois at Urbana-Champaign , Urbana, IL 61801, USA
| | - Joshua Goldford
- Physics of Living Systems, Massachusetts Institute of Technology , Cambridge, MA 02139, USA
| | - Mark Gorelik
- Department of Microbiology and Cell Sciences, University of Florida , Gainesville, FL 32611, USA
| | - Benjamin M Gyori
- Laboratory of Systems Pharmacology, Harvard Medical School , Boston, MA 02115, USA
| | - Christopher Henry
- Mathematics and Computer Science Division, Argonne National Laboratory , Argonne, IL 60439, USA
| | - Geoffrey Hutinet
- Department of Microbiology and Cell Sciences, University of Florida , Gainesville, FL 32611, USA
| | - Marshall Jaroch
- Department of Microbiology and Cell Sciences, University of Florida , Gainesville, FL 32611, USA
| | - Peter D Karp
- Bioinformatics Research Group, SRI International , Menlo Park, CA 94025, USA
| | | | - Zhiyong Lu
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH) , 8600 Rockville Pike, Bethesda, MD 20817, USA
| | - Aron Marchler-Bauer
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH) , 8600 Rockville Pike, Bethesda, MD 20817, USA
| | - Maria-Jesus Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus , Hinxton CB10 1SD, UK
| | - Claire McWhite
- Lewis-Sigler Institute for Integrative Genomics, Princeton University , Princeton, NJ 08540, USA
| | - Gaurav D Moghe
- Plant Biology Section, School of Integrative Plant Science, Cornell University , Ithaca, NY 14853, USA
| | - Paul Monaghan
- Department of Agricultural Education and Communication, University of Florida , Gainesville, FL 32611, USA
| | - Anne Morgat
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire , Geneva 4 CH-1211, Switzerland
| | - Christopher J Mungall
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory , Berkeley, CA 94720, USA
| | - Darren A Natale
- Georgetown University Medical Center , Washington, DC 20007, USA
| | - William C Nelson
- Biological Sciences Division, Pacific Northwest National Laboratories , Richland, WA 99354, USA
| | - Seán O’Donoghue
- School of Biotechnology and Biomolecular Sciences, University of NSW , Sydney, NSW 2052, Australia
| | - Christine Orengo
- Department of Structural and Molecular Biology, University College London , London WC1E 6BT, UK
| | | | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University , Boston, MA 02115, USA
| | - Colbie Reed
- Department of Microbiology and Cell Sciences, University of Florida , Gainesville, FL 32611, USA
| | | | - Dmitri Rodionov
- Sanford Burnham Prebys Medical Discovery Institute , La Jolla, CA 92037, USA
| | - Irina A Rodionova
- Department of Bioengineering, Division of Engineering, University of California at San Diego , La Jolla, CA 92093-0412, USA
| | - Jeffrey D Rudolf
- Department of Chemistry, University of Florida , Gainesville, FL 32611, USA
| | - Lana Saleh
- New England Biolabs , Ipswich, MA 01938, USA
| | - Gloria Sheynkman
- Department of Molecular Physiology and Biological Physics, University of Virginia , Charlottesville, VA, USA
| | - Francoise Thibaud-Nissen
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH) , 8600 Rockville Pike, Bethesda, MD 20817, USA
| | - Paul D Thomas
- Department of Population and Public Health Sciences, University of Southern California , Los Angeles, CA 90033, USA
| | - Peter Uetz
- Center for Biological Data Science, Virginia Commonwealth University , Richmond, VA 23284, USA
| | - David Vallenet
- LABGeM, Génomique Métabolique, CEA, Genoscope, Institut François Jacob, Université d’Évry, Université Paris-Saclay, CNRS , Evry 91057, France
| | - Erica Watson Carter
- Department of Plant Pathology, University of Florida Citrus Research and Education Center , 700 Experiment Station Rd., Lake Alfred, FL 33850, USA
| | | | - Valerie Wood
- Department of Biochemistry, University of Cambridge , Cambridge CB2 1GA, UK
| | - Elisha M Wood-Charlson
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory , Berkeley, CA 94720, USA
| | - Jin Xu
- Department of Plant Pathology, University of Florida Citrus Research and Education Center , 700 Experiment Station Rd., Lake Alfred, FL 33850, USA
| |
Collapse
|
3
|
Bergès C, Cahoreau E, Millard P, Enjalbert B, Dinclaux M, Heuillet M, Kulyk H, Gales L, Butin N, Chazalviel M, Palama T, Guionnet M, Sokol S, Peyriga L, Bellvert F, Heux S, Portais JC. Exploring the Glucose Fluxotype of the E. coli y-ome Using High-Resolution Fluxomics. Metabolites 2021; 11:metabo11050271. [PMID: 33926117 PMCID: PMC8145925 DOI: 10.3390/metabo11050271] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 04/16/2021] [Accepted: 04/23/2021] [Indexed: 01/26/2023] Open
Abstract
We have developed a robust workflow to measure high-resolution fluxotypes (metabolic flux phenotypes) for large strain libraries under fully controlled growth conditions. This was achieved by optimizing and automating the whole high-throughput fluxomics process and integrating all relevant software tools. This workflow allowed us to obtain highly detailed maps of carbon fluxes in the central carbon metabolism in a fully automated manner. It was applied to investigate the glucose fluxotypes of 180 Escherichia coli strains deleted for y-genes. Since the products of these y-genes potentially play a role in a variety of metabolic processes, the experiments were designed to be agnostic as to their potential metabolic impact. The obtained data highlight the robustness of E. coli’s central metabolism to y-gene deletion. For two y-genes, deletion resulted in significant changes in carbon and energy fluxes, demonstrating the involvement of the corresponding y-gene products in metabolic function or regulation. This work also introduces novel metrics to measure the actual scope and quality of high-throughput fluxomics investigations.
Collapse
Affiliation(s)
- Cécilia Bergès
- Toulouse Biotechnology Institute (TBI), Université de Toulouse, CNRS, INRAE, INSA, 31077 Toulouse, France; (C.B.); (E.C.); (P.M.); (B.E.); (M.D.); (M.H.); (H.K.); (L.G.); (N.B.); (T.P.); (M.G.); (S.S.); (L.P.); (F.B.); (S.H.)
- MetaToul-MetaboHUB, National Infrastructure of Metabolomics & Fluxomics (ANR-11-INBS-0010), 31077 Toulouse, France
| | - Edern Cahoreau
- Toulouse Biotechnology Institute (TBI), Université de Toulouse, CNRS, INRAE, INSA, 31077 Toulouse, France; (C.B.); (E.C.); (P.M.); (B.E.); (M.D.); (M.H.); (H.K.); (L.G.); (N.B.); (T.P.); (M.G.); (S.S.); (L.P.); (F.B.); (S.H.)
- MetaToul-MetaboHUB, National Infrastructure of Metabolomics & Fluxomics (ANR-11-INBS-0010), 31077 Toulouse, France
| | - Pierre Millard
- Toulouse Biotechnology Institute (TBI), Université de Toulouse, CNRS, INRAE, INSA, 31077 Toulouse, France; (C.B.); (E.C.); (P.M.); (B.E.); (M.D.); (M.H.); (H.K.); (L.G.); (N.B.); (T.P.); (M.G.); (S.S.); (L.P.); (F.B.); (S.H.)
| | - Brice Enjalbert
- Toulouse Biotechnology Institute (TBI), Université de Toulouse, CNRS, INRAE, INSA, 31077 Toulouse, France; (C.B.); (E.C.); (P.M.); (B.E.); (M.D.); (M.H.); (H.K.); (L.G.); (N.B.); (T.P.); (M.G.); (S.S.); (L.P.); (F.B.); (S.H.)
| | - Mickael Dinclaux
- Toulouse Biotechnology Institute (TBI), Université de Toulouse, CNRS, INRAE, INSA, 31077 Toulouse, France; (C.B.); (E.C.); (P.M.); (B.E.); (M.D.); (M.H.); (H.K.); (L.G.); (N.B.); (T.P.); (M.G.); (S.S.); (L.P.); (F.B.); (S.H.)
| | - Maud Heuillet
- Toulouse Biotechnology Institute (TBI), Université de Toulouse, CNRS, INRAE, INSA, 31077 Toulouse, France; (C.B.); (E.C.); (P.M.); (B.E.); (M.D.); (M.H.); (H.K.); (L.G.); (N.B.); (T.P.); (M.G.); (S.S.); (L.P.); (F.B.); (S.H.)
- MetaToul-MetaboHUB, National Infrastructure of Metabolomics & Fluxomics (ANR-11-INBS-0010), 31077 Toulouse, France
| | - Hanna Kulyk
- Toulouse Biotechnology Institute (TBI), Université de Toulouse, CNRS, INRAE, INSA, 31077 Toulouse, France; (C.B.); (E.C.); (P.M.); (B.E.); (M.D.); (M.H.); (H.K.); (L.G.); (N.B.); (T.P.); (M.G.); (S.S.); (L.P.); (F.B.); (S.H.)
- MetaToul-MetaboHUB, National Infrastructure of Metabolomics & Fluxomics (ANR-11-INBS-0010), 31077 Toulouse, France
| | - Lara Gales
- Toulouse Biotechnology Institute (TBI), Université de Toulouse, CNRS, INRAE, INSA, 31077 Toulouse, France; (C.B.); (E.C.); (P.M.); (B.E.); (M.D.); (M.H.); (H.K.); (L.G.); (N.B.); (T.P.); (M.G.); (S.S.); (L.P.); (F.B.); (S.H.)
- MetaToul-MetaboHUB, National Infrastructure of Metabolomics & Fluxomics (ANR-11-INBS-0010), 31077 Toulouse, France
| | - Noémie Butin
- Toulouse Biotechnology Institute (TBI), Université de Toulouse, CNRS, INRAE, INSA, 31077 Toulouse, France; (C.B.); (E.C.); (P.M.); (B.E.); (M.D.); (M.H.); (H.K.); (L.G.); (N.B.); (T.P.); (M.G.); (S.S.); (L.P.); (F.B.); (S.H.)
- MetaToul-MetaboHUB, National Infrastructure of Metabolomics & Fluxomics (ANR-11-INBS-0010), 31077 Toulouse, France
- RESTORE, Université de Toulouse, Inserm U1031, CNRS 5070, UPS, EFS, 31100 Toulouse, France
| | - Maxime Chazalviel
- Toxalim (Research Centre in Food Toxicology), UMR1331, Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, 31300 Toulouse, France;
| | - Tony Palama
- Toulouse Biotechnology Institute (TBI), Université de Toulouse, CNRS, INRAE, INSA, 31077 Toulouse, France; (C.B.); (E.C.); (P.M.); (B.E.); (M.D.); (M.H.); (H.K.); (L.G.); (N.B.); (T.P.); (M.G.); (S.S.); (L.P.); (F.B.); (S.H.)
- MetaToul-MetaboHUB, National Infrastructure of Metabolomics & Fluxomics (ANR-11-INBS-0010), 31077 Toulouse, France
| | - Matthieu Guionnet
- Toulouse Biotechnology Institute (TBI), Université de Toulouse, CNRS, INRAE, INSA, 31077 Toulouse, France; (C.B.); (E.C.); (P.M.); (B.E.); (M.D.); (M.H.); (H.K.); (L.G.); (N.B.); (T.P.); (M.G.); (S.S.); (L.P.); (F.B.); (S.H.)
- MetaToul-MetaboHUB, National Infrastructure of Metabolomics & Fluxomics (ANR-11-INBS-0010), 31077 Toulouse, France
| | - Sergueï Sokol
- Toulouse Biotechnology Institute (TBI), Université de Toulouse, CNRS, INRAE, INSA, 31077 Toulouse, France; (C.B.); (E.C.); (P.M.); (B.E.); (M.D.); (M.H.); (H.K.); (L.G.); (N.B.); (T.P.); (M.G.); (S.S.); (L.P.); (F.B.); (S.H.)
| | - Lindsay Peyriga
- Toulouse Biotechnology Institute (TBI), Université de Toulouse, CNRS, INRAE, INSA, 31077 Toulouse, France; (C.B.); (E.C.); (P.M.); (B.E.); (M.D.); (M.H.); (H.K.); (L.G.); (N.B.); (T.P.); (M.G.); (S.S.); (L.P.); (F.B.); (S.H.)
- MetaToul-MetaboHUB, National Infrastructure of Metabolomics & Fluxomics (ANR-11-INBS-0010), 31077 Toulouse, France
| | - Floriant Bellvert
- Toulouse Biotechnology Institute (TBI), Université de Toulouse, CNRS, INRAE, INSA, 31077 Toulouse, France; (C.B.); (E.C.); (P.M.); (B.E.); (M.D.); (M.H.); (H.K.); (L.G.); (N.B.); (T.P.); (M.G.); (S.S.); (L.P.); (F.B.); (S.H.)
- MetaToul-MetaboHUB, National Infrastructure of Metabolomics & Fluxomics (ANR-11-INBS-0010), 31077 Toulouse, France
| | - Stéphanie Heux
- Toulouse Biotechnology Institute (TBI), Université de Toulouse, CNRS, INRAE, INSA, 31077 Toulouse, France; (C.B.); (E.C.); (P.M.); (B.E.); (M.D.); (M.H.); (H.K.); (L.G.); (N.B.); (T.P.); (M.G.); (S.S.); (L.P.); (F.B.); (S.H.)
| | - Jean-Charles Portais
- Toulouse Biotechnology Institute (TBI), Université de Toulouse, CNRS, INRAE, INSA, 31077 Toulouse, France; (C.B.); (E.C.); (P.M.); (B.E.); (M.D.); (M.H.); (H.K.); (L.G.); (N.B.); (T.P.); (M.G.); (S.S.); (L.P.); (F.B.); (S.H.)
- MetaToul-MetaboHUB, National Infrastructure of Metabolomics & Fluxomics (ANR-11-INBS-0010), 31077 Toulouse, France
- RESTORE, Université de Toulouse, Inserm U1031, CNRS 5070, UPS, EFS, 31100 Toulouse, France
- Correspondence:
| |
Collapse
|
4
|
Zimmermann J, Kaleta C, Waschina S. gapseq: informed prediction of bacterial metabolic pathways and reconstruction of accurate metabolic models. Genome Biol 2021; 22:81. [PMID: 33691770 PMCID: PMC7949252 DOI: 10.1186/s13059-021-02295-1] [Citation(s) in RCA: 86] [Impact Index Per Article: 28.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Accepted: 02/10/2021] [Indexed: 12/21/2022] Open
Abstract
Genome-scale metabolic models of microorganisms are powerful frameworks to predict phenotypes from an organism's genotype. While manual reconstructions are laborious, automated reconstructions often fail to recapitulate known metabolic processes. Here we present gapseq ( https://github.com/jotech/gapseq ), a new tool to predict metabolic pathways and automatically reconstruct microbial metabolic models using a curated reaction database and a novel gap-filling algorithm. On the basis of scientific literature and experimental data for 14,931 bacterial phenotypes, we demonstrate that gapseq outperforms state-of-the-art tools in predicting enzyme activity, carbon source utilisation, fermentation products, and metabolic interactions within microbial communities.
Collapse
Affiliation(s)
- Johannes Zimmermann
- Christian-Albrechts-University Kiel, Institute of Experimental Medicine, Research Group Medical Systems Biology, Michaelis-Str. 5, Kiel, 24105 Germany
| | - Christoph Kaleta
- Christian-Albrechts-University Kiel, Institute of Experimental Medicine, Research Group Medical Systems Biology, Michaelis-Str. 5, Kiel, 24105 Germany
| | - Silvio Waschina
- Christian-Albrechts-University Kiel, Institute of Experimental Medicine, Research Group Medical Systems Biology, Michaelis-Str. 5, Kiel, 24105 Germany
- Christian-Albrechts-University Kiel, Institute of Human Nutrition and Food Science, Nutriinformatics, Heinrich-Hecht-Platz 10, Kiel, 24118 Germany
| |
Collapse
|
5
|
Grosjean N, Blaby-Haas CE. Leveraging computational genomics to understand the molecular basis of metal homeostasis. THE NEW PHYTOLOGIST 2020; 228:1472-1489. [PMID: 32696981 DOI: 10.1111/nph.16820] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/22/2020] [Accepted: 07/03/2020] [Indexed: 06/11/2023]
Abstract
Genome-based data is helping to reveal the diverse strategies plants and algae use to maintain metal homeostasis. In addition to acquisition, distribution and storage of metals, acclimating to feast or famine can involve a wealth of genes that we are just now starting to understand. The fast-paced acquisition of genome-based data, however, is far outpacing our ability to experimentally characterize protein function. Computational genomic approaches are needed to fill the gap between what is known and unknown. To avoid misconstruing bioinformatically derived data, which is the root cause of the inaccurate functional annotations that plague databases, functional inferences from diverse sources and contextualization of that evidence with a robust understanding of protein family evolution is needed. Phylogenomic- and comparative-genomic-based studies can aid in the interpretation of experimental data or provide a spark for the discovery of a new function. These analyses not only lead to novel insight into a target protein's function but can generate thought-provoking insights across protein families.
Collapse
Affiliation(s)
- Nicolas Grosjean
- Biology Department, Brookhaven National Laboratory, Upton, NY, 11973, USA
| | | |
Collapse
|
6
|
Zaman AB, Kamranfar P, Domeniconi C, Shehu A. Reducing Ensembles of Protein Tertiary Structures Generated De Novo via Clustering. Molecules 2020; 25:E2228. [PMID: 32397410 PMCID: PMC7248879 DOI: 10.3390/molecules25092228] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2020] [Revised: 04/21/2020] [Accepted: 04/28/2020] [Indexed: 11/16/2022] Open
Abstract
Controlling the quality of tertiary structures computed for a protein molecule remains a central challenge in de-novo protein structure prediction. The rule of thumb is to generate as many structures as can be afforded, effectively acknowledging that having more structures increases the likelihood that some will reside near the sought biologically-active structure. A major drawback with this approach is that computing a large number of structures imposes time and space costs. In this paper, we propose a novel clustering-based approach which we demonstrate to significantly reduce an ensemble of generated structures without sacrificing quality. Evaluations are related on both benchmark and CASP target proteins. Structure ensembles subjected to the proposed approach and the source code of the proposed approach are publicly-available at the links provided in Section 1.
Collapse
Affiliation(s)
- Ahmed Bin Zaman
- Department of Computer Science, George Mason University, Fairfax, VA 22030, USA; (A.B.Z.); (P.K.)
| | - Parastoo Kamranfar
- Department of Computer Science, George Mason University, Fairfax, VA 22030, USA; (A.B.Z.); (P.K.)
| | - Carlotta Domeniconi
- Department of Computer Science, George Mason University, Fairfax, VA 22030, USA; (A.B.Z.); (P.K.)
| | - Amarda Shehu
- Department of Computer Science, George Mason University, Fairfax, VA 22030, USA; (A.B.Z.); (P.K.)
- Center for Advancing Human-Machine Partnerships, George Mason University, Fairfax, VA 22030, USA
- Department of Bioengineering, George Mason University, Fairfax, VA 22030, USA
- School of Systems Biology, George Mason University, Fairfax, VA 22030, USA
| |
Collapse
|
7
|
Chen R, Yang S, Zhang L, Zhou YJ. Advanced Strategies for Production of Natural Products in Yeast. iScience 2020; 23:100879. [PMID: 32087574 PMCID: PMC7033514 DOI: 10.1016/j.isci.2020.100879] [Citation(s) in RCA: 79] [Impact Index Per Article: 19.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2019] [Revised: 01/27/2020] [Accepted: 01/28/2020] [Indexed: 12/30/2022] Open
Abstract
Natural products account for more than 50% of all small-molecule pharmaceutical agents currently in clinical use. However, low availability often becomes problematic when a bioactive natural product is promising to become a pharmaceutical or leading compound. Advances in synthetic biology and metabolic engineering provide a feasible solution for sustainable supply of these compounds. In this review, we have summarized current progress in engineering yeast cell factories for production of natural products, including terpenoids, alkaloids, and phenylpropanoids. We then discuss advanced strategies in metabolic engineering at three different dimensions, including point, line, and plane (corresponding to the individual enzymes and cofactors, metabolic pathways, and the global cellular network). In particular, we comprehensively discuss how to engineer cofactor biosynthesis for enhancing the biosynthesis efficiency, other than the enzyme activity. Finally, current challenges and perspective are also discussed for future engineering direction.
Collapse
Affiliation(s)
- Ruibing Chen
- Division of Biotechnology, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, 457 Zhongshan Road, Dalian 116023, China; Department of Pharmaceutical Botany, School of Pharmacy, Naval Medical University, Shanghai 200433, China
| | - Shan Yang
- Division of Biotechnology, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, 457 Zhongshan Road, Dalian 116023, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Lei Zhang
- Department of Pharmaceutical Botany, School of Pharmacy, Naval Medical University, Shanghai 200433, China; Biomedical Innovation R&D Center, School of Medicine, Shanghai University, Shanghai 200444, China
| | - Yongjin J Zhou
- Division of Biotechnology, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, 457 Zhongshan Road, Dalian 116023, China; CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, China; Dalian Key Laboratory of Energy Biotechnology, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, 457 Zhongshan Road, Dalian 116023, China.
| |
Collapse
|
8
|
Zaman AB, Shehu A. Building maps of protein structure spaces in template-free protein structure prediction. J Bioinform Comput Biol 2020; 17:1940013. [DOI: 10.1142/s0219720019400134] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
An important goal in template-free protein structure prediction is how to control the quality of computed tertiary structures of a target amino-acid sequence. Despite great advances in algorithmic research, given the size, dimensionality, and inherent characteristics of the protein structure space, this task remains exceptionally challenging. It is current practice to aim to generate as many structures as can be afforded so as to increase the likelihood that some of them will reside near the sought but unknown biologically-active/native structure. When operating within a given computational budget, this is impractical and uninformed by any metrics of interest. In this paper, we propose instead to equip algorithms that generate tertiary structures, also known as decoy generation algorithms, with memory of the protein structure space that they explore. Specifically, we propose an evolving, granularity-controllable map of the protein structure space that makes use of low-dimensional representations of protein structures. Evaluations on diverse target sequences that include recent hard CASP targets show that drastic reductions in storage can be made without sacrificing decoy quality. The presented results make the case that integrating a map of the protein structure space is a promising mechanism to enhance decoy generation algorithms in template-free protein structure prediction.
Collapse
Affiliation(s)
- Ahmed Bin Zaman
- Department of Computer Science, George Mason University, Fairfax, VA 22030, USA
| | - Amarda Shehu
- Department of Computer Science, George Mason University, Fairfax, VA 22030, USA
| |
Collapse
|
9
|
Akhter N, Chennupati G, Kabir KL, Djidjev H, Shehu A. Unsupervised and Supervised Learning over theEnergy Landscape for Protein Decoy Selection. Biomolecules 2019; 9:E607. [PMID: 31615116 PMCID: PMC6843838 DOI: 10.3390/biom9100607] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2019] [Revised: 10/03/2019] [Accepted: 10/04/2019] [Indexed: 11/17/2022] Open
Abstract
The energy landscape that organizes microstates of a molecular system and governs theunderlying molecular dynamics exposes the relationship between molecular form/structure, changesto form, and biological activity or function in the cell. However, several challenges stand in the wayof leveraging energy landscapes for relating structure and structural dynamics to function. Energylandscapes are high-dimensional, multi-modal, and often overly-rugged. Deep wells or basins inthem do not always correspond to stable structural states but are instead the result of inherentinaccuracies in semi-empirical molecular energy functions. Due to these challenges, energeticsis typically ignored in computational approaches addressing long-standing central questions incomputational biology, such as protein decoy selection. In the latter, the goal is to determine over apossibly large number of computationally-generated three-dimensional structures of a protein thosestructures that are biologically-active/native. In recent work, we have recast our attention on theprotein energy landscape and its role in helping us to advance decoy selection. Here, we summarizesome of our successes so far in this direction via unsupervised learning. More importantly, we furtheradvance the argument that the energy landscape holds valuable information to aid and advance thestate of protein decoy selection via novel machine learning methodologies that leverage supervisedlearning. Our focus in this article is on decoy selection for the purpose of a rigorous, quantitativeevaluation of how leveraging protein energy landscapes advances an important problem in proteinmodeling. However, the ideas and concepts presented here are generally useful to make discoveriesin studies aiming to relate molecular structure and structural dynamics to function.
Collapse
Affiliation(s)
- Nasrin Akhter
- Department of Computer Science, George Mason University, Fairfax, VA 22030, USA.
| | - Gopinath Chennupati
- Information Sciences (CCS-3) Group, Los Alamos National Laboratory, Los Alamos, NM 87545, USA.
| | - Kazi Lutful Kabir
- Department of Computer Science, George Mason University, Fairfax, VA 22030, USA.
| | - Hristo Djidjev
- Information Sciences (CCS-3) Group, Los Alamos National Laboratory, Los Alamos, NM 87545, USA.
| | - Amarda Shehu
- Department of Computer Science, George Mason University, Fairfax, VA 22030, USA.
- Center for Adaptive Human-Machine Partnership, George Mason University, Fairfax, VA 22030, USA.
- Department of Bioengineering, George Mason University, Fairfax, VA 22030, USA.
- School of Systems Biology, George Mason University, Fairfax, VA 22030, USA.
| |
Collapse
|
10
|
Ahmad S, Prathipati P, Tripathi LP, Chen YA, Arya A, Murakami Y, Mizuguchi K. Integrating sequence and gene expression information predicts genome-wide DNA-binding proteins and suggests a cooperative mechanism. Nucleic Acids Res 2019; 46:54-70. [PMID: 29186632 PMCID: PMC5758906 DOI: 10.1093/nar/gkx1166] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2016] [Accepted: 11/15/2017] [Indexed: 12/29/2022] Open
Abstract
DNA-binding proteins (DBPs) perform diverse biological functions ranging from transcription to pathogen sensing. Machine learning methods can not only identify DBPs de novo but also provide insights into their DNA-recognition dynamics. However, it remains unclear whether available methods that can accurately predict DNA-binding sites in known DBPs can also identify novel DBPs. Moreover, sequence information is blind to the cellular- and disease-specific contexts of DBP activities, whereas the under-utilized knowledge from public gene expression data offers great promise. To address these issues, we have developed novel methods for predicting DBPs by integrating sequence and gene expression-derived features and applied them to explore human, mouse and Arabidopsis proteomes. While our sequence-based models outperformed the gene expression-based ones, some proteins with weaker DBP-like sequence features were correctly predicted by gene expression-based features, suggesting that these proteins acquire a tangible DBP functionality in a conducive gene expression environment. Analysis of motif enrichment among the co-expressed genes of top 100 candidates DBPs from hitherto unannotated genes provides further avenues to explore their functional associations.
Collapse
Affiliation(s)
- Shandar Ahmad
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi 110067, India.,Laboratory of Bioinformatics, National Institutes of Biomedical Innovation, Health and Nutrition, 7-6-8 Saito-asagi, Ibaraki, Osaka 5670085, Japan
| | - Philip Prathipati
- Laboratory of Bioinformatics, National Institutes of Biomedical Innovation, Health and Nutrition, 7-6-8 Saito-asagi, Ibaraki, Osaka 5670085, Japan
| | - Lokesh P Tripathi
- Laboratory of Bioinformatics, National Institutes of Biomedical Innovation, Health and Nutrition, 7-6-8 Saito-asagi, Ibaraki, Osaka 5670085, Japan
| | - Yi-An Chen
- Laboratory of Bioinformatics, National Institutes of Biomedical Innovation, Health and Nutrition, 7-6-8 Saito-asagi, Ibaraki, Osaka 5670085, Japan
| | - Ajay Arya
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi 110067, India
| | - Yoichi Murakami
- Laboratory of Bioinformatics, National Institutes of Biomedical Innovation, Health and Nutrition, 7-6-8 Saito-asagi, Ibaraki, Osaka 5670085, Japan
| | - Kenji Mizuguchi
- Laboratory of Bioinformatics, National Institutes of Biomedical Innovation, Health and Nutrition, 7-6-8 Saito-asagi, Ibaraki, Osaka 5670085, Japan
| |
Collapse
|
11
|
Abstract
Over 100 whole-genome sequences from algae are published or soon to be published. The rapidly increasing availability of these fundamental resources is changing how we understand one of the most diverse, complex, and understudied groups of photosynthetic eukaryotes. Genome sequences provide a window into the functional potential of individual algae, with phylogenomics and functional genomics as tools for contextualizing and transferring knowledge from reference organisms into less well-characterized systems. Remarkably, over half of the proteins encoded by algal genomes are of unknown function, highlighting the volume of functional capabilities yet to be discovered. In this review, we provide an overview of publicly available algal genomes, their associated protein inventories, and their quality, with a summary of the statuses of protein function understanding and predictions.
Collapse
Affiliation(s)
| | - Sabeeha S Merchant
- Departments of Plant and Microbial Biology and Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
- Institute for Genomics and Proteomics, University of California, Los Angeles, California 90095, USA
| |
Collapse
|
12
|
Zaman AB, Shehu A. Balancing multiple objectives in conformation sampling to control decoy diversity in template-free protein structure prediction. BMC Bioinformatics 2019; 20:211. [PMID: 31023237 PMCID: PMC6485169 DOI: 10.1186/s12859-019-2794-5] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2019] [Accepted: 04/04/2019] [Indexed: 12/05/2022] Open
Abstract
Background Computational approaches for the determination of biologically-active/native three-dimensional structures of proteins with novel sequences have to handle several challenges. The (conformation) space of possible three-dimensional spatial arrangements of the chain of amino acids that constitute a protein molecule is vast and high-dimensional. Exploration of the conformation spaces is performed in a sampling-based manner and is biased by the internal energy that sums atomic interactions. Even state-of-the-art energy functions that quantify such interactions are inherently inaccurate and associate with protein conformation spaces overly rugged energy surfaces riddled with artifact local minima. The response to these challenges in template-free protein structure prediction is to generate large numbers of low-energy conformations (also referred to as decoys) as a way of increasing the likelihood of having a diverse decoy dataset that covers a sufficient number of local minima possibly housing near-native conformations. Results In this paper we pursue a complementary approach and propose to directly control the diversity of generated decoys. Inspired by hard optimization problems in high-dimensional and non-linear variable spaces, we propose that conformation sampling for decoy generation is more naturally framed as a multi-objective optimization problem. We demonstrate that mechanisms inherent to evolutionary search techniques facilitate such framing and allow balancing multiple objectives in protein conformation sampling. We showcase here an operationalization of this idea via a novel evolutionary algorithm that has high exploration capability and is also able to access lower-energy regions of the energy landscape of a given protein with similar or better proximity to the known native structure than several state-of-the-art decoy generation algorithms. Conclusions The presented results constitute a promising research direction in improving decoy generation for template-free protein structure prediction with regards to balancing of multiple conflicting objectives under an optimization framework. Future work will consider additional optimization objectives and variants of improvement and selection operators to apportion a fixed computational budget. Of particular interest are directions of research that attenuate dependence on protein energy models.
Collapse
Affiliation(s)
- Ahmed Bin Zaman
- Department of Computer Science, George Mason University, Fairfax, 22030, VA, USA
| | - Amarda Shehu
- Department of Computer Science, George Mason University, Fairfax, 22030, VA, USA.,Department of Bioengineering, George Mason University, Fairfax, 22030, VA, USA.,School of Systems Biology, George Mason University, Manassas, 20110, VA, USA
| |
Collapse
|
13
|
Characterization of l-Carnitine Metabolism in Sinorhizobium meliloti. J Bacteriol 2019; 201:JB.00772-18. [PMID: 30670548 DOI: 10.1128/jb.00772-18] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2018] [Accepted: 01/15/2019] [Indexed: 11/20/2022] Open
Abstract
l-Carnitine is a trimethylammonium compound mostly known for its contribution to fatty acid transport into mitochondria. In bacteria, it is synthesized from γ-butyrobetaine (GBB) and can be used as a carbon source. l-Carnitine can be formed directly by GBB hydroxylation or synthesized via a biosynthetic route analogous to fatty acid degradation. However, this multistep pathway has not been experimentally characterized. In this work, we identified by gene context analysis a cluster of l-carnitine anabolic genes next to those involved in its catabolism and proceeded to the complete in vitro characterization of l-carnitine biosynthesis and degradation in Sinorhizobium meliloti The five enzymes catalyzing the seven steps that convert GBB to glycine betaine are described. Metabolomic analysis confirmed the multistage synthesis of l-carnitine in GBB-grown cells but also revealed that GBB is synthesized by S. meliloti To our knowledge, this is the first report of aerobic GBB synthesis in bacteria. The conservation of l-carnitine metabolism genes in different bacterial taxonomic classes underscores the role of l-carnitine as a ubiquitous nutrient.IMPORTANCE The experimental characterization of novel metabolic pathways is essential for realizing the value of genome sequences and improving our knowledge of the enzymatic capabilities of the bacterial world. However, 30% to 40% of genes of a typical genome remain unannotated or associated with a putative function. We used enzyme kinetics, liquid chromatography-mass spectroscopy (LC-MS)-based metabolomics, and mutant phenotyping for the characterization of the metabolism of l-carnitine in Sinorhizobium meliloti to provide an accurate annotation of the corresponding genes. The occurrence of conserved gene clusters for carnitine metabolism in soil, plant-associated, and marine bacteria underlines the environmental abundance of carnitine and suggests this molecule might make a significant contribution to ecosystem nitrogen and carbon cycling.
Collapse
|
14
|
de Crécy-Lagard V, Haas D, Hanson AD. Newly-discovered enzymes that function in metabolite damage-control. Curr Opin Chem Biol 2018; 47:101-108. [PMID: 30268903 DOI: 10.1016/j.cbpa.2018.09.014] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2018] [Revised: 08/19/2018] [Accepted: 09/11/2018] [Indexed: 01/26/2023]
Abstract
Enzymes of unknown function are estimated to make up around 25% of the sequenced proteome. In the past decade, over 20 conserved families have been shown to function in the metabolism of 'damaged' or abnormal metabolites that are wasteful and often toxic. These newly discovered damage-control enzymes either repair or inactivate the offending metabolites, or pre-empt their formation in the first place. Comparative genomics has been of prime importance in predicting the functions of damage-control enzymes and in guiding the biochemical and genetic tests required to validate these functions.
Collapse
Affiliation(s)
- Valérie de Crécy-Lagard
- Department of Microbiology and Cell Science, University of Florida, Gainesville, FL, USA; Genetics Institute, University of Florida, Gainesville, FL, USA.
| | - Drago Haas
- Department of Microbiology and Cell Science, University of Florida, Gainesville, FL, USA
| | - Andrew D Hanson
- Horticultural Sciences Department, University of Florida, Gainesville, FL, USA
| |
Collapse
|
15
|
An Energy Landscape Treatment of Decoy Selection in Template-Free Protein Structure Prediction. COMPUTATION 2018. [DOI: 10.3390/computation6020039] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
16
|
From Extraction of Local Structures of Protein Energy Landscapes to Improved Decoy Selection in Template-Free Protein Structure Prediction. Molecules 2018; 23:molecules23010216. [PMID: 29351266 PMCID: PMC6017496 DOI: 10.3390/molecules23010216] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2017] [Accepted: 12/11/2017] [Indexed: 11/17/2022] Open
Abstract
Due to the essential role that the three-dimensional conformation of a protein plays in regulating interactions with molecular partners, wet and dry laboratories seek biologically-active conformations of a protein to decode its function. Computational approaches are gaining prominence due to the labor and cost demands of wet laboratory investigations. Template-free methods can now compute thousands of conformations known as decoys, but selecting native conformations from the generated decoys remains challenging. Repeatedly, research has shown that the protein energy functions whose minima are sought in the generation of decoys are unreliable indicators of nativeness. The prevalent approach ignores energy altogether and clusters decoys by conformational similarity. Complementary recent efforts design protein-specific scoring functions or train machine learning models on labeled decoys. In this paper, we show that an informative consideration of energy can be carried out under the energy landscape view. Specifically, we leverage local structures known as basins in the energy landscape probed by a template-free method. We propose and compare various strategies of basin-based decoy selection that we demonstrate are superior to clustering-based strategies. The presented results point to further directions of research for improving decoy selection, including the ability to properly consider the multiplicity of native conformations of proteins.
Collapse
|
17
|
Ellens KW, Christian N, Singh C, Satagopam VP, May P, Linster CL. Confronting the catalytic dark matter encoded by sequenced genomes. Nucleic Acids Res 2017; 45:11495-11514. [PMID: 29059321 PMCID: PMC5714238 DOI: 10.1093/nar/gkx937] [Citation(s) in RCA: 46] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2017] [Accepted: 10/03/2017] [Indexed: 01/02/2023] Open
Abstract
The post-genomic era has provided researchers with a deluge of protein sequences. However, a significant fraction of the proteins encoded by sequenced genomes remains without an identified function. Here, we aim at determining how many enzymes of uncertain or unknown function are still present in the Saccharomyces cerevisiae and human proteomes. Using information available in the Swiss-Prot, BRENDA and KEGG databases in combination with a Hidden Markov Model-based method, we estimate that >600 yeast and 2000 human proteins (>30% of their proteins of unknown function) are enzymes whose precise function(s) remain(s) to be determined. This illustrates the impressive scale of the ‘unknown enzyme problem’. We extensively review classical biochemical as well as more recent systematic experimental and computational approaches that can be used to support enzyme function discovery research. Finally, we discuss the possible roles of the elusive catalysts in light of recent developments in the fields of enzymology and metabolism as well as the significance of the unknown enzyme problem in the context of metabolic modeling, metabolic engineering and rare disease research.
Collapse
Affiliation(s)
- Kenneth W Ellens
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, L-4362 Esch-sur-Alzette, Luxembourg
| | - Nils Christian
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, L-4362 Esch-sur-Alzette, Luxembourg
| | - Charandeep Singh
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, L-4362 Esch-sur-Alzette, Luxembourg
| | - Venkata P Satagopam
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, L-4362 Esch-sur-Alzette, Luxembourg
| | - Patrick May
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, L-4362 Esch-sur-Alzette, Luxembourg
| | - Carole L Linster
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, L-4362 Esch-sur-Alzette, Luxembourg
| |
Collapse
|
18
|
Caufield JH, Wimble C, Shary S, Wuchty S, Uetz P. Bacterial protein meta-interactomes predict cross-species interactions and protein function. BMC Bioinformatics 2017; 18:171. [PMID: 28298180 PMCID: PMC5353844 DOI: 10.1186/s12859-017-1585-0] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2016] [Accepted: 03/04/2017] [Indexed: 11/24/2022] Open
Abstract
Background Protein-protein interactions (PPIs) can offer compelling evidence for protein function, especially when viewed in the context of proteome-wide interactomes. Bacteria have been popular subjects of interactome studies: more than six different bacterial species have been the subjects of comprehensive interactome studies while several more have had substantial segments of their proteomes screened for interactions. The protein interactomes of several bacterial species have been completed, including several from prominent human pathogens. The availability of interactome data has brought challenges, as these large data sets are difficult to compare across species, limiting their usefulness for broad studies of microbial genetics and evolution. Results In this study, we use more than 52,000 unique protein-protein interactions (PPIs) across 349 different bacterial species and strains to determine their conservation across data sets and taxonomic groups. When proteins are collapsed into orthologous groups (OGs) the resulting meta-interactome still includes more than 43,000 interactions, about 14,000 of which involve proteins of unknown function. While conserved interactions provide support for protein function in their respective species data, we found only 429 PPIs (~1% of the available data) conserved in two or more species, rendering any cross-species interactome comparison immediately useful. The meta-interactome serves as a model for predicting interactions, protein functions, and even full interactome sizes for species with limited to no experimentally observed PPI, including Bacillus subtilis and Salmonella enterica which are predicted to have up to 18,000 and 31,000 PPIs, respectively. Conclusions In the course of this work, we have assembled cross-species interactome comparisons that will allow interactomics researchers to anticipate the structures of yet-unexplored microbial interactomes and to focus on well-conserved yet uncharacterized interactors for further study. Such conserved interactions should provide evidence for important but yet-uncharacterized aspects of bacterial physiology and may provide targets for anti-microbial therapies. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1585-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- J Harry Caufield
- Center for the Study of Biological Complexity, Virginia Commonwealth University, Richmond, Virginia, USA
| | - Christopher Wimble
- Center for the Study of Biological Complexity, Virginia Commonwealth University, Richmond, Virginia, USA
| | - Semarjit Shary
- Center for the Study of Biological Complexity, Virginia Commonwealth University, Richmond, Virginia, USA
| | - Stefan Wuchty
- Department of Computer Science, University of Miami, Coral Gables, Florida, USA.,Center for Computational Science, University of Miami, Coral Gables, Florida, USA.,Sylvester Comprehensive Cancer Center, Miller School of Medicine, University of Miami, Miami, Florida, USA
| | - Peter Uetz
- Center for the Study of Biological Complexity, Virginia Commonwealth University, Richmond, Virginia, USA.
| |
Collapse
|
19
|
Sévin DC, Fuhrer T, Zamboni N, Sauer U. Nontargeted in vitro metabolomics for high-throughput identification of novel enzymes in Escherichia coli. Nat Methods 2016; 14:187-194. [DOI: 10.1038/nmeth.4103] [Citation(s) in RCA: 97] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2016] [Accepted: 10/19/2016] [Indexed: 12/14/2022]
|
20
|
Thiaville JJ, Flood J, Yurgel S, Prunetti L, Elbadawi-Sidhu M, Hutinet G, Forouhar F, Zhang X, Ganesan V, Reddy P, Fiehn O, Gerlt JA, Hunt JF, Copley SD, de Crécy-Lagard V. Members of a Novel Kinase Family (DUF1537) Can Recycle Toxic Intermediates into an Essential Metabolite. ACS Chem Biol 2016; 11:2304-11. [PMID: 27294475 DOI: 10.1021/acschembio.6b00279] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
DUF1537 is a novel family of kinases identified by comparative genomic approaches. The family is widespread and found in all sequenced plant genomes and 16% of sequenced bacterial genomes. DUF1537 is not a monofunctional family and contains subgroups that can be separated by phylogenetic and genome neighborhood context analyses. A subset of the DUF1537 proteins is strongly associated by physical clustering and gene fusion with the PdxA2 family, demonstrated here to be a functional paralog of the 4-phosphohydroxy-l-threonine dehydrogenase enzyme (PdxA), a central enzyme in the synthesis of pyridoxal-5'-phosphate (PLP) in proteobacteria. Some members of this DUF1537 subgroup phosphorylate l-4-hydroxythreonine (4HT) into 4-phosphohydroxy-l-threonine (4PHT), the substrate of PdxA, in vitro and in vivo. This provides an alternative route to PLP from the toxic antimetabolite 4HT that can be directly generated from the toxic intermediate glycolaldehyde. Although the kinetic and physical clustering data indicate that these functions in PLP synthesis are not the main roles of the DUF1537-PdxA2 enzymes, genetic and physiological data suggest these side activities function has been maintained in diverse sets of organisms.
Collapse
Affiliation(s)
- Jennifer J. Thiaville
- Department
of Microbiology and Cell Science and Genetic Institute, University of Florida, P.O. Box 110700, Gainesville, Florida 32611-0700, United States
| | - Jake Flood
- Department
of Molecular, Cellular and Developmental Biology, University of Colorado, Boulder, Colorado United States
| | - Svetlana Yurgel
- Dalhousie University, 6299 South
St., Halifax, NS B3H 4R2, Canada
| | - Laurence Prunetti
- Department
of Microbiology and Cell Science and Genetic Institute, University of Florida, P.O. Box 110700, Gainesville, Florida 32611-0700, United States
| | | | - Geoffrey Hutinet
- Department
of Microbiology and Cell Science and Genetic Institute, University of Florida, P.O. Box 110700, Gainesville, Florida 32611-0700, United States
| | - Farhad Forouhar
- Department
of Biological Sciences, Columbia University, New York, New York, United States
| | - Xinshuai Zhang
- Institute
for Genomic Biology, University of Illinois at Urbana−Champaign, Urbana, Illinois 61801, United States
| | - Venkateswaran Ganesan
- Department
of Microbiology and Cell Science and Genetic Institute, University of Florida, P.O. Box 110700, Gainesville, Florida 32611-0700, United States
| | - Patrick Reddy
- Department
of Microbiology and Cell Science and Genetic Institute, University of Florida, P.O. Box 110700, Gainesville, Florida 32611-0700, United States
| | - Oliver Fiehn
- West
Coast Metabolomics Center, UC Davis, Davis, California, United States
- King Abdulaziz University, Biochemistry Department, Jeddah, Saudi Arabia
| | - J. A. Gerlt
- Institute
for Genomic Biology, University of Illinois at Urbana−Champaign, Urbana, Illinois 61801, United States
| | - John F. Hunt
- Department
of Biological Sciences, Columbia University, New York, New York, United States
| | - Shelley D. Copley
- Department
of Molecular, Cellular and Developmental Biology, University of Colorado, Boulder, Colorado United States
| | - Valérie de Crécy-Lagard
- Department
of Microbiology and Cell Science and Genetic Institute, University of Florida, P.O. Box 110700, Gainesville, Florida 32611-0700, United States
| |
Collapse
|
21
|
Baltrus DA. Divorcing Strain Classification from Species Names. Trends Microbiol 2016; 24:431-439. [PMID: 26947794 DOI: 10.1016/j.tim.2016.02.004] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2015] [Revised: 01/29/2016] [Accepted: 02/04/2016] [Indexed: 02/01/2023]
Abstract
Confusion about strain classification and nomenclature permeates modern microbiology. Although taxonomists have traditionally acted as gatekeepers of order, the numbers of, and speed at which, new strains are identified has outpaced the opportunity for professional classification for many lineages. Furthermore, the growth of bioinformatics and database-fueled investigations have placed metadata curation in the hands of researchers with little taxonomic experience. Here I describe practical challenges facing modern microbial taxonomy, provide an overview of complexities of classification for environmentally ubiquitous taxa like Pseudomonas syringae, and emphasize that classification can be independent of nomenclature. A move toward implementation of relational classification schemes based on inherent properties of whole genomes could provide sorely needed continuity in how strains are referenced across manuscripts and data sets.
Collapse
Affiliation(s)
- David A Baltrus
- School of Plant Sciences, University of Arizona, Tucson, AZ, USA.
| |
Collapse
|
22
|
Merlet B, Paulhe N, Vinson F, Frainay C, Chazalviel M, Poupin N, Gloaguen Y, Giacomoni F, Jourdan F. A Computational Solution to Automatically Map Metabolite Libraries in the Context of Genome Scale Metabolic Networks. Front Mol Biosci 2016; 3:2. [PMID: 26909353 PMCID: PMC4754433 DOI: 10.3389/fmolb.2016.00002] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2015] [Accepted: 01/25/2016] [Indexed: 11/13/2022] Open
Abstract
This article describes a generic programmatic method for mapping chemical compound libraries on organism-specific metabolic networks from various databases (KEGG, BioCyc) and flat file formats (SBML and Matlab files). We show how this pipeline was successfully applied to decipher the coverage of chemical libraries set up by two metabolomics facilities MetaboHub (French National infrastructure for metabolomics and fluxomics) and Glasgow Polyomics (GP) on the metabolic networks available in the MetExplore web server. The present generic protocol is designed to formalize and reduce the volume of information transfer between the library and the network database. Matching of metabolites between libraries and metabolic networks is based on InChIs or InChIKeys and therefore requires that these identifiers are specified in both libraries and networks. In addition to providing covering statistics, this pipeline also allows the visualization of mapping results in the context of metabolic networks. In order to achieve this goal, we tackled issues on programmatic interaction between two servers, improvement of metabolite annotation in metabolic networks and automatic loading of a mapping in genome scale metabolic network analysis tool MetExplore. It is important to note that this mapping can also be performed on a single or a selection of organisms of interest and is thus not limited to large facilities.
Collapse
Affiliation(s)
- Benjamin Merlet
- TOXALIM (Research Centre in Food Toxicology), Institut National de la Recherche Agronomique, UMR1331, Université de Toulouse Toulouse, France
| | - Nils Paulhe
- Nutrition Humaine, Plateforme d'Exploration du Métabolisme, Institut National de la Recherche Agronomique, Centre Clermont-Ferrand-Theix, UMR 1019 Saint-Genès-Champanelle, France
| | - Florence Vinson
- TOXALIM (Research Centre in Food Toxicology), Institut National de la Recherche Agronomique, UMR1331, Université de Toulouse Toulouse, France
| | - Clément Frainay
- TOXALIM (Research Centre in Food Toxicology), Institut National de la Recherche Agronomique, UMR1331, Université de Toulouse Toulouse, France
| | - Maxime Chazalviel
- TOXALIM (Research Centre in Food Toxicology), Institut National de la Recherche Agronomique, UMR1331, Université de Toulouse Toulouse, France
| | - Nathalie Poupin
- TOXALIM (Research Centre in Food Toxicology), Institut National de la Recherche Agronomique, UMR1331, Université de Toulouse Toulouse, France
| | - Yoann Gloaguen
- Glasgow Polyomics, College of Medical, Veterinary and Life Sciences, University of Glasgow Glasgow, UK
| | - Franck Giacomoni
- Nutrition Humaine, Plateforme d'Exploration du Métabolisme, Institut National de la Recherche Agronomique, Centre Clermont-Ferrand-Theix, UMR 1019 Saint-Genès-Champanelle, France
| | - Fabien Jourdan
- TOXALIM (Research Centre in Food Toxicology), Institut National de la Recherche Agronomique, UMR1331, Université de Toulouse Toulouse, France
| |
Collapse
|
23
|
Kuznetsova E, Nocek B, Brown G, Makarova KS, Flick R, Wolf YI, Khusnutdinova A, Evdokimova E, Jin K, Tan K, Hanson AD, Hasnain G, Zallot R, de Crécy-Lagard V, Babu M, Savchenko A, Joachimiak A, Edwards AM, Koonin EV, Yakunin AF. Functional Diversity of Haloacid Dehalogenase Superfamily Phosphatases from Saccharomyces cerevisiae: BIOCHEMICAL, STRUCTURAL, AND EVOLUTIONARY INSIGHTS. J Biol Chem 2015; 290:18678-98. [PMID: 26071590 DOI: 10.1074/jbc.m115.657916] [Citation(s) in RCA: 66] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2015] [Indexed: 12/15/2022] Open
Abstract
The haloacid dehalogenase (HAD)-like enzymes comprise a large superfamily of phosphohydrolases present in all organisms. The Saccharomyces cerevisiae genome encodes at least 19 soluble HADs, including 10 uncharacterized proteins. Here, we biochemically characterized 13 yeast phosphatases from the HAD superfamily, which includes both specific and promiscuous enzymes active against various phosphorylated metabolites and peptides with several HADs implicated in detoxification of phosphorylated compounds and pseudouridine. The crystal structures of four yeast HADs provided insight into their active sites, whereas the structure of the YKR070W dimer in complex with substrate revealed a composite substrate-binding site. Although the S. cerevisiae and Escherichia coli HADs share low sequence similarities, the comparison of their substrate profiles revealed seven phosphatases with common preferred substrates. The cluster of secondary substrates supporting significant activity of both S. cerevisiae and E. coli HADs includes 28 common metabolites that appear to represent the pool of potential activities for the evolution of novel HAD phosphatases. Evolution of novel substrate specificities of HAD phosphatases shows no strict correlation with sequence divergence. Thus, evolution of the HAD superfamily combines the conservation of the overall substrate pool and the substrate profiles of some enzymes with remarkable biochemical and structural flexibility of other superfamily members.
Collapse
Affiliation(s)
- Ekaterina Kuznetsova
- From the Structural Genomics Consortium, University of Toronto, Toronto, Ontario M5G 1L7, Canada
| | - Boguslaw Nocek
- the Midwest Center for Structural Genomics and Structural Biology Center, Biosciences Division, Argonne National Laboratory, Argonne, Illinois 60439
| | - Greg Brown
- the Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, Ontario M5S 3E5, Canada
| | - Kira S Makarova
- the National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894
| | - Robert Flick
- the Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, Ontario M5S 3E5, Canada
| | - Yuri I Wolf
- the National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894
| | - Anna Khusnutdinova
- the Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, Ontario M5S 3E5, Canada
| | - Elena Evdokimova
- the Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, Ontario M5S 3E5, Canada
| | - Ke Jin
- the Department of Biochemistry, Research and Innovation Centre, University of Regina, Regina, Saskatchewan S4S 0A2, Canada, and
| | - Kemin Tan
- the Midwest Center for Structural Genomics and Structural Biology Center, Biosciences Division, Argonne National Laboratory, Argonne, Illinois 60439
| | - Andrew D Hanson
- the Horticultural Sciences Department, Department of Microbiology and Cell Science, University of Florida, Gainesville, Florida 32611
| | - Ghulam Hasnain
- the Horticultural Sciences Department, Department of Microbiology and Cell Science, University of Florida, Gainesville, Florida 32611
| | - Rémi Zallot
- the Horticultural Sciences Department, Department of Microbiology and Cell Science, University of Florida, Gainesville, Florida 32611
| | - Valérie de Crécy-Lagard
- the Horticultural Sciences Department, Department of Microbiology and Cell Science, University of Florida, Gainesville, Florida 32611
| | - Mohan Babu
- the Department of Biochemistry, Research and Innovation Centre, University of Regina, Regina, Saskatchewan S4S 0A2, Canada, and
| | - Alexei Savchenko
- the Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, Ontario M5S 3E5, Canada
| | - Andrzej Joachimiak
- the Midwest Center for Structural Genomics and Structural Biology Center, Biosciences Division, Argonne National Laboratory, Argonne, Illinois 60439
| | - Aled M Edwards
- From the Structural Genomics Consortium, University of Toronto, Toronto, Ontario M5G 1L7, Canada, the Midwest Center for Structural Genomics and Structural Biology Center, Biosciences Division, Argonne National Laboratory, Argonne, Illinois 60439
| | - Eugene V Koonin
- the National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894
| | - Alexander F Yakunin
- the Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, Ontario M5S 3E5, Canada,
| |
Collapse
|
24
|
Vester JK, Glaring MA, Stougaard P. Improved cultivation and metagenomics as new tools for bioprospecting in cold environments. Extremophiles 2014; 19:17-29. [PMID: 25399309 PMCID: PMC4272415 DOI: 10.1007/s00792-014-0704-3] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2014] [Accepted: 10/28/2014] [Indexed: 11/28/2022]
Abstract
Only a small minority of microorganisms from an environmental sample can be cultured in the laboratory leaving the enormous bioprospecting potential of the uncultured diversity unexplored. This resource can be accessed by improved cultivation methods in which the natural environment is brought into the laboratory or through metagenomic approaches where culture-independent DNA sequence information can be combined with functional screening. The coupling of these two approaches circumvents the need for pure, cultured isolates and can be used to generate targeted information on communities enriched for specific activities or properties. Bioprospecting in extreme environments is often associated with additional challenges such as low biomass, slow cell growth, complex sample matrices, restricted access, and problematic in situ analyses. In addition, the choice of vector system and expression host may be limited as few hosts are available for expression of genes with extremophilic properties. This review summarizes the methods developed for improved cultivation as well as the metagenomic approaches for bioprospecting with focus on the challenges faced by bioprospecting in cold environments.
Collapse
Affiliation(s)
- Jan Kjølhede Vester
- Department of Plant and Environmental Sciences, University of Copenhagen, Thorvaldsensvej 40, 1871, Frederiksberg C, Denmark,
| | | | | |
Collapse
|
25
|
El Yacoubi B, de Crécy-Lagard V. Integrative data-mining tools to link gene and function. Methods Mol Biol 2014; 1101:43-66. [PMID: 24233777 DOI: 10.1007/978-1-62703-721-1_4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/10/2023]
Abstract
Information derived from genomic and post-genomic data can be efficiently used to link gene and function. Several web-based platforms have been developed to mine these types of data by integrating different tools. This method paper is designed to allow the user to navigate these platforms in order to make functional predictions. The main focus is on phylogenetic distribution and physical clustering tools, but other tools such as pathway reconstruction, gene fusions, and analysis of high-throughput experimental data are also surveyed.
Collapse
Affiliation(s)
- Basma El Yacoubi
- Department of Microbiology and Cell Science, University of Florida, Gainesville, FL, USA
| | | |
Collapse
|
26
|
Blumer-Schuette SE, Brown SD, Sander KB, Bayer EA, Kataeva I, Zurawski JV, Conway JM, Adams MWW, Kelly RM. Thermophilic lignocellulose deconstruction. FEMS Microbiol Rev 2014; 38:393-448. [DOI: 10.1111/1574-6976.12044] [Citation(s) in RCA: 128] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2012] [Revised: 08/20/2013] [Accepted: 08/28/2013] [Indexed: 11/28/2022] Open
|
27
|
Induced Genetic Variation, TILLING and NGS-Based Cloning. BIOTECHNOLOGICAL APPROACHES TO BARLEY IMPROVEMENT 2014. [DOI: 10.1007/978-3-662-44406-1_15] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
28
|
Londin E, Yadav P, Surrey S, Kricka LJ, Fortina P. Use of linkage analysis, genome-wide association studies, and next-generation sequencing in the identification of disease-causing mutations. Methods Mol Biol 2013; 1015:127-46. [PMID: 23824853 DOI: 10.1007/978-1-62703-435-7_8] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
For the past two decades, linkage analysis and genome-wide analysis have greatly advanced our knowledge of the human genome. But despite these successes the genetic architecture of diseases remains unknown. More recently, the availability of next-generation sequencing has dramatically increased our capability for determining DNA sequences that range from large portions of one individual's genome to targeted regions of many genomes in a cohort of interest. In this review, we highlight the successes and shortcomings that have been achieved using genome-wide association studies (GWAS) to identify the variants contributing to disease. We further review the methods and use of new technologies, based on next-generation sequencing, that are becoming increasingly used to expand our knowledge of the causes of genetic disease.
Collapse
Affiliation(s)
- Eric Londin
- Computational Medicine Center, Thomas Jefferson University Jefferson Medical College, Philadelphia, PA, USA
| | | | | | | | | |
Collapse
|
29
|
Blais EM, Chavali AK, Papin JA. Linking genome-scale metabolic modeling and genome annotation. Methods Mol Biol 2013; 985:61-83. [PMID: 23417799 DOI: 10.1007/978-1-62703-299-5_4] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Genome-scale metabolic network reconstructions, assembled from annotated genomes, serve as a platform for integrating data from heterogeneous sources and generating hypotheses for further experimental validation. Implementing constraint-based modeling techniques such as flux balance analysis (FBA) on network reconstructions allows for interrogating metabolism at a systems level, which aids in identifying and rectifying gaps in knowledge. With genome sequences for various organisms from prokaryotes to eukaryotes becoming increasingly available, a significant bottleneck lies in the structural and functional annotation of these sequences. Using topologically based and biologically inspired metabolic network refinement, we can better characterize enzymatic functions present in an organism and link annotation of these functions to candidate transcripts; both steps can be experimentally validated.
Collapse
Affiliation(s)
- Edik M Blais
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA, USA
| | | | | |
Collapse
|
30
|
Automatic assignment of prokaryotic genes to functional categories using literature profiling. PLoS One 2012; 7:e47436. [PMID: 23077617 PMCID: PMC3471813 DOI: 10.1371/journal.pone.0047436] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2012] [Accepted: 09/17/2012] [Indexed: 11/19/2022] Open
Abstract
In the last years, there was an exponential increase in the number of publicly available genomes. Once finished, most genome projects lack financial support to review annotations. A few of these gene annotations are based on a combination of bioinformatics evidence, however, in most cases, annotations are based solely on sequence similarity to a previously known gene, which was most probably annotated in the same way. As a result, a large number of predicted genes remain unassigned to any functional category despite the fact that there is enough evidence in the literature to predict their function. We developed a classifier trained with term-frequency vectors automatically disclosed from text corpora of an ensemble of genes representative of each functional category of the J. Craig Venter Institute Comprehensive Microbial Resource (JCVI-CMR) ontology. The classifier achieved up to 84% precision with 68% recall (for confidence≥0.4), F-measure 0.76 (recall and precision equally weighted) in an independent set of 2,220 genes, from 13 bacterial species, previously classified by JCVI-CMR into unambiguous categories of its ontology. Finally, the classifier assigned (confidence≥0.7) to functional categories a total of 5,235 out of the ∼24 thousand genes previously in categories "Unknown function" or "Unclassified" for which there is literature in MEDLINE. Two biologists reviewed the literature of 100 of these genes, randomly picket, and assigned them to the same functional categories predicted by the automatic classifier. Our results confirmed the hypothesis that it is possible to confidently assign genes of a real world repository to functional categories, based exclusively on the automatic profiling of its associated literature. The LitProf--Gene Classifier web server is accessible at: www.cebio.org/litprofGC.
Collapse
|
31
|
Crécy-Lagard VD, Phillips G, Grochowski LL, Yacoubi BE, Jenney F, Adams MWW, Murzin AG, White RH. Comparative genomics guided discovery of two missing archaeal enzyme families involved in the biosynthesis of the pterin moiety of tetrahydromethanopterin and tetrahydrofolate. ACS Chem Biol 2012; 7:1807-16. [PMID: 22931285 PMCID: PMC3500442 DOI: 10.1021/cb300342u] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
![]()
C-1 carriers are essential cofactors in all domains of
life, and
in Archaea, these can be derivatives of tetrahydromethanopterin (H4-MPT) or tetrahydrofolate (H4-folate). Their synthesis
requires 6-hydroxymethyl-7,8-dihydropterin diphosphate (6-HMDP) as
the precursor, but the nature of pathways that lead to its formation
were unknown until the recent discovery of the GTP cyclohydrolase
IB/MptA family that catalyzes the first step, the conversion of GTP
to dihydroneopterin 2′,3′-cyclic phosphate or 7,8-dihydroneopterin
triphosphate [El Yacoubi, B.; et al. (2006) J. Biol. Chem., 281, 37586–37593
and Grochowski, L. L.; et al. (2007) Biochemistry46, 6658–6667]. Using a combination of comparative
genomics analyses, heterologous complementation tests, and in vitro assays, we show that the archaeal protein families
COG2098 and COG1634 specify two of the missing 6-HMDP synthesis enzymes.
Members of the COG2098 family catalyze the formation of 6-hydroxymethyl-7,8-dihydropterin
from 7,8-dihydroneopterin, while members of the COG1634 family catalyze
the formation of 6-HMDP from 6-hydroxymethyl-7,8-dihydropterin. The
discovery of these missing genes solves a long-standing mystery and
provides novel examples of convergent evolutions where proteins of
dissimilar architectures perform the same biochemical function.
Collapse
Affiliation(s)
- Valérie de Crécy-Lagard
- Department of Microbiology and
Department of Microbiology and Cell Science, University of Florida, P.O. Box 110700, Gainesville, Florida 32611-0700,
United States
| | - Gabriela Phillips
- Department of Microbiology and
Department of Microbiology and Cell Science, University of Florida, P.O. Box 110700, Gainesville, Florida 32611-0700,
United States
| | - Laura L. Grochowski
- Department
of Biochemistry (0308), Virginia Polytechnic Institute and State University, Blacksburg, Virginia 24061, United
States
| | - Basma El Yacoubi
- Department of Microbiology and
Department of Microbiology and Cell Science, University of Florida, P.O. Box 110700, Gainesville, Florida 32611-0700,
United States
| | - Francis Jenney
- Department of Basic
Sciences,
Georgia Campus, Philadelphia College of Osteopathic Medicine, Suwanee, Georgia 30024, United States
| | - Michael W. W. Adams
- Department of Biochemistry and
Molecular Biology, University of Georgia, Athens, Georgia 30602, United States
| | - Alexey G. Murzin
- MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 0QH,
U.K
| | - Robert H. White
- Department
of Biochemistry (0308), Virginia Polytechnic Institute and State University, Blacksburg, Virginia 24061, United
States
| |
Collapse
|
32
|
Dozmorov MG, Giles CB, Wren JD. Predicting gene ontology from a global meta-analysis of 1-color microarray experiments. BMC Bioinformatics 2011; 12 Suppl 10:S14. [PMID: 22166114 PMCID: PMC3236836 DOI: 10.1186/1471-2105-12-s10-s14] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Abstract
Collapse
Affiliation(s)
- Mikhail G Dozmorov
- Arthritis and Clinical Immunology Research Program, Oklahoma Medical Research Foundation 825 NE 13th Street, Oklahoma City, Oklahoma 73104-5005, USA
| | | | | |
Collapse
|