1
|
Reynolds KA, Rosa-Molinar E, Ward RE, Zhang H, Urbanowicz BR, Settles AM. Accelerating biological insight for understudied genes. Integr Comp Biol 2021; 61:2233-2243. [PMID: 33970251 DOI: 10.1093/icb/icab029] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The rapid expansion of genome sequence data is increasing the discovery of protein-coding genes across all domains of life. Annotating these genes with reliable functional information is necessary to understand evolution, to define the full biochemical space accessed by nature, and to identify target genes for biotechnology improvements. The vast majority of proteins are annotated based on sequence conservation with no specific biological, biochemical, genetic, or cellular function identified. Recent technical advances throughout the biological sciences enable experimental research on these understudied protein-coding genes in a broader collection of species. However, scientists have incentives and biases to continue focusing on well documented genes within their preferred model organism. This perspective suggests a research model that seeks to break historic silos of research bias by enabling interdisciplinary teams to accelerate biological functional annotation. We propose an initiative to develop coordinated projects of collaborating evolutionary biologists, cell biologists, geneticists, and biochemists that will focus on subsets of target genes in multiple model organisms. Concurrent analysis in multiple organisms takes advantage of evolutionary divergence and selection, which causes individual species to be better suited as experimental models for specific genes. Most importantly, multisystem approaches would encourage transdisciplinary critical thinking and hypothesis testing that is inherently slow in current biological research.
Collapse
Affiliation(s)
- Kimberly A Reynolds
- The Green Center for Systems Biology and the Department of Biophysics, The University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Eduardo Rosa-Molinar
- Department of Pharmacology & Toxicology, The University of Kansas, Lawrence, KS 66047, USA
| | - Robert E Ward
- Department of Biology, Case Western Reserve University, Cleveland, OH 44106, USA
| | - Hongbin Zhang
- Department of Soil and Crop Sciences, Texas A&M University, College Station, TX 77843, USA
| | - Breeanna R Urbanowicz
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, Georgia 30602, USA
| | - A Mark Settles
- Bioengineering Branch, NASA Ames Research Center, Moffett Field, CA USA
| |
Collapse
|
2
|
Ellens KW, Christian N, Singh C, Satagopam VP, May P, Linster CL. Confronting the catalytic dark matter encoded by sequenced genomes. Nucleic Acids Res 2017; 45:11495-11514. [PMID: 29059321 PMCID: PMC5714238 DOI: 10.1093/nar/gkx937] [Citation(s) in RCA: 52] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2017] [Accepted: 10/03/2017] [Indexed: 01/02/2023] Open
Abstract
The post-genomic era has provided researchers with a deluge of protein sequences. However, a significant fraction of the proteins encoded by sequenced genomes remains without an identified function. Here, we aim at determining how many enzymes of uncertain or unknown function are still present in the Saccharomyces cerevisiae and human proteomes. Using information available in the Swiss-Prot, BRENDA and KEGG databases in combination with a Hidden Markov Model-based method, we estimate that >600 yeast and 2000 human proteins (>30% of their proteins of unknown function) are enzymes whose precise function(s) remain(s) to be determined. This illustrates the impressive scale of the ‘unknown enzyme problem’. We extensively review classical biochemical as well as more recent systematic experimental and computational approaches that can be used to support enzyme function discovery research. Finally, we discuss the possible roles of the elusive catalysts in light of recent developments in the fields of enzymology and metabolism as well as the significance of the unknown enzyme problem in the context of metabolic modeling, metabolic engineering and rare disease research.
Collapse
Affiliation(s)
- Kenneth W Ellens
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, L-4362 Esch-sur-Alzette, Luxembourg
| | - Nils Christian
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, L-4362 Esch-sur-Alzette, Luxembourg
| | - Charandeep Singh
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, L-4362 Esch-sur-Alzette, Luxembourg
| | - Venkata P Satagopam
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, L-4362 Esch-sur-Alzette, Luxembourg
| | - Patrick May
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, L-4362 Esch-sur-Alzette, Luxembourg
| | - Carole L Linster
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, L-4362 Esch-sur-Alzette, Luxembourg
| |
Collapse
|
3
|
Zallot R, Harrison KJ, Kolaczkowski B, de Crécy-Lagard V. Functional Annotations of Paralogs: A Blessing and a Curse. Life (Basel) 2016; 6:life6030039. [PMID: 27618105 PMCID: PMC5041015 DOI: 10.3390/life6030039] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2016] [Revised: 08/29/2016] [Accepted: 09/02/2016] [Indexed: 12/15/2022] Open
Abstract
Gene duplication followed by mutation is a classic mechanism of neofunctionalization, producing gene families with functional diversity. In some cases, a single point mutation is sufficient to change the substrate specificity and/or the chemistry performed by an enzyme, making it difficult to accurately separate enzymes with identical functions from homologs with different functions. Because sequence similarity is often used as a basis for assigning functional annotations to genes, non-isofunctional gene families pose a great challenge for genome annotation pipelines. Here we describe how integrating evolutionary and functional information such as genome context, phylogeny, metabolic reconstruction and signature motifs may be required to correctly annotate multifunctional families. These integrative analyses can also lead to the discovery of novel gene functions, as hints from specific subgroups can guide the functional characterization of other members of the family. We demonstrate how careful manual curation processes using comparative genomics can disambiguate subgroups within large multifunctional families and discover their functions. We present the COG0720 protein family as a case study. We also discuss strategies to automate this process to improve the accuracy of genome functional annotation pipelines.
Collapse
Affiliation(s)
- Rémi Zallot
- Department of Microbiology and Cell Science, Institute of Food and Agricultural Sciences, University of Florida, Gainesville, FL 32611, USA.
| | - Katherine J Harrison
- Department of Microbiology and Cell Science, Institute of Food and Agricultural Sciences, University of Florida, Gainesville, FL 32611, USA.
| | - Bryan Kolaczkowski
- Department of Microbiology and Cell Science, Institute of Food and Agricultural Sciences, University of Florida, Gainesville, FL 32611, USA.
| | - Valérie de Crécy-Lagard
- Department of Microbiology and Cell Science, Institute of Food and Agricultural Sciences, University of Florida, Gainesville, FL 32611, USA.
| |
Collapse
|
4
|
Antal B, Chessel A, Carazo Salas RE. Mineotaur: a tool for high-content microscopy screen sharing and visual analytics. Genome Biol 2015; 16:283. [PMID: 26679168 PMCID: PMC4699365 DOI: 10.1186/s13059-015-0836-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
High-throughput/high-content microscopy-based screens are powerful tools for functional genomics, yielding intracellular information down to the level of single-cells for thousands of genotypic conditions. However, accessing their data requires specialized knowledge and most often that data is no longer analyzed after initial publication. We describe Mineotaur (http://www.mineotaur.org), a open-source, downloadable web application that allows easy online sharing and interactive visualisation of large screen datasets, facilitating their dissemination and further analysis, and enhancing their impact.
Collapse
Affiliation(s)
- Bálint Antal
- Genetics Department, University of Cambridge, Downing Street, Cambridge, CB2 3EH, UK.
| | - Anatole Chessel
- Genetics Department, University of Cambridge, Downing Street, Cambridge, CB2 3EH, UK.
| | - Rafael E Carazo Salas
- Genetics Department, University of Cambridge, Downing Street, Cambridge, CB2 3EH, UK.
| |
Collapse
|
5
|
Niehaus TD, Thamm AMK, de Crécy-Lagard V, Hanson AD. Proteins of Unknown Biochemical Function: A Persistent Problem and a Roadmap to Help Overcome It. PLANT PHYSIOLOGY 2015; 169:1436-42. [PMID: 26269542 PMCID: PMC4634069 DOI: 10.1104/pp.15.00959] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/22/2015] [Accepted: 08/11/2015] [Indexed: 05/03/2023]
Abstract
The number of sequenced genomes is rapidly increasing, but functional annotation of the genes in these genomes lags far behind. Even in Arabidopsis (Arabidopsis thaliana), only approximately 40% of enzyme- and transporter-encoding genes have credible functional annotations, and this number is even lower in nonmodel plants. Functional characterization of unknown genes is a challenge, but various databases (e.g. for protein localization and coexpression) can be mined to provide clues. If homologous microbial genes exist-and about one-half the genes encoding unknown enzymes and transporters in Arabidopsis have microbial homologs-cross-kingdom comparative genomics can powerfully complement plant-based data. Multiple lines of evidence can strengthen predictions and warrant experimental characterization. In some cases, relatively quick tests in genetically tractable microbes can determine whether a prediction merits biochemical validation, which is costly and demands specialized skills.
Collapse
Affiliation(s)
- Thomas D Niehaus
- Horticultural Sciences Department (T.D.N., A.M.K.T., A.D.H.) and Microbiology and Cell Science Department (V.d.C.-L.), University of Florida, Gainesville, Florida 32611
| | - Antje M K Thamm
- Horticultural Sciences Department (T.D.N., A.M.K.T., A.D.H.) and Microbiology and Cell Science Department (V.d.C.-L.), University of Florida, Gainesville, Florida 32611
| | - Valérie de Crécy-Lagard
- Horticultural Sciences Department (T.D.N., A.M.K.T., A.D.H.) and Microbiology and Cell Science Department (V.d.C.-L.), University of Florida, Gainesville, Florida 32611
| | - Andrew D Hanson
- Horticultural Sciences Department (T.D.N., A.M.K.T., A.D.H.) and Microbiology and Cell Science Department (V.d.C.-L.), University of Florida, Gainesville, Florida 32611
| |
Collapse
|