1
|
Tararina MA, Allen KN. Bioinformatic Analysis of the Flavin-Dependent Amine Oxidase Superfamily: Adaptations for Substrate Specificity and Catalytic Diversity. J Mol Biol 2020; 432:3269-3288. [PMID: 32198115 DOI: 10.1016/j.jmb.2020.03.007] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Revised: 02/24/2020] [Accepted: 03/06/2020] [Indexed: 12/29/2022]
Abstract
The flavin-dependent amine oxidase (FAO) superfamily consists of over 9000 nonredundant sequences represented in all domains of life. Of the thousands of members identified, only 214 have been functionally annotated to date, and 40 unique structures are represented in the Protein Data Bank. The few functionally characterized members share a catalytic mechanism involving the oxidation of an amine substrate through transfer of a hydride to the FAD cofactor, with differences observed in substrate specificities. Previous studies have focused on comparing a subset of superfamily members. Here, we present a comprehensive analysis of the FAO superfamily based on reaction mechanism and substrate recognition. Using a dataset of 9192 sequences, a sequence similarity network, and subsequently, a genome neighborhood network were constructed, organizing the superfamily into eight subgroups that accord with substrate type. Likewise, through phylogenetic analysis, the evolutionary relationship of subgroups was determined, delineating the divergence between enzymes based on organism, substrate, and mechanism. In addition, using sequences and atomic coordinates of 22 structures from the Protein Data Bank to perform sequence and structural alignments, active-site elements were identified, showing divergence from the canonical aromatic-cage residues to accommodate large substrates. These specificity determinants are held in a structural framework comprising a core domain catalyzing the oxidation of amines with an auxiliary domain for substrate recognition. Overall, analysis of the FAO superfamily reveals a modular fold with cofactor and substrate-binding domains allowing for diversity of recognition via insertion/deletions. This flexibility allows facile evolution of new activities, as shown by reinvention of function between subfamilies.
Collapse
Affiliation(s)
- Margarita A Tararina
- Program in Biomolecular Pharmacology, Boston University School of Medicine, 72 East Concord Street, Boston, MA 02118, USA
| | - Karen N Allen
- Program in Biomolecular Pharmacology, Boston University School of Medicine, 72 East Concord Street, Boston, MA 02118, USA; Department of Chemistry, Boston University, 590 Commonwealth Avenue, Boston, MA 02215, USA.
| |
Collapse
|
2
|
Zhao S, Sakai A, Zhang X, Vetting MW, Kumar R, Hillerich B, San Francisco B, Solbiati J, Steves A, Brown S, Akiva E, Barber A, Seidel RD, Babbitt PC, Almo SC, Gerlt JA, Jacobson MP. Prediction and characterization of enzymatic activities guided by sequence similarity and genome neighborhood networks. eLife 2014; 3. [PMID: 24980702 PMCID: PMC4113996 DOI: 10.7554/elife.03275] [Citation(s) in RCA: 69] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2014] [Accepted: 06/26/2014] [Indexed: 01/10/2023] Open
Abstract
Metabolic pathways in eubacteria and archaea often are encoded by operons and/or gene clusters (genome neighborhoods) that provide important clues for assignment of both enzyme functions and metabolic pathways. We describe a bioinformatic approach (genome neighborhood network; GNN) that enables large scale prediction of the in vitro enzymatic activities and in vivo physiological functions (metabolic pathways) of uncharacterized enzymes in protein families. We demonstrate the utility of the GNN approach by predicting in vitro activities and in vivo functions in the proline racemase superfamily (PRS; InterPro IPR008794). The predictions were verified by measuring in vitro activities for 51 proteins in 12 families in the PRS that represent ∼85% of the sequences; in vitro activities of pathway enzymes, carbon/nitrogen source phenotypes, and/or transcriptomic studies confirmed the predicted pathways. The synergistic use of sequence similarity networks3 and GNNs will facilitate the discovery of the components of novel, uncharacterized metabolic pathways in sequenced genomes. DOI:http://dx.doi.org/10.7554/eLife.03275.001 DNA molecules are polymers in which four nucleotides—guanine, adenine, thymine, and cytosine—are arranged along a sugar backbone. The sequence of these four nucleotides along the DNA strand determines the genetic code of the organism, and can be deciphered using various genome sequencing techniques. Microbial genomes are particularly easy to sequence as they contain fewer than several million nucleotides, compared with the 3 billion or so nucleotides that are present in the human genome. Reading a genome sequence is straight forward, but predicting the physiological functions of the proteins encoded by the genes in the sequence can be challenging. In a process called genome annotation, the function of protein is predicted by comparing the relevant gene to the genes of proteins with known functions. However, microbial genomes and proteins are hugely diverse and over 50% of the microbial genomes that have been sequenced have not yet been related to any physiological function. With thousands of microbial genomes waiting to be deciphered, large scale approaches are needed. Zhao et al. take advantage of a particular characteristic of microbial genomes. DNA sequences that code for two proteins required for the same task tend to be closer to each other in the genome than two sequences that code for unrelated functions. Operons are an extreme example; an operon is a unit of DNA that contains several genes that are expressed as proteins at the same time. Zhao et al. have developed a bioinformatic method called the genome neighbourhood network approach to work out the function of proteins based on their position relative to other proteins in the genome. When applied to the proline racemase superfamily (PRS), which contains enzymes with similar sequences that can catalyze three distinct chemical reactions, the new approach was able to assign a function to the majority of proteins in a public database of PRS enzymes, and also revealed new members of the PRS family. Experiments confirmed that the proteins behaved as predicted. The next challenge is to develop the genome neighbourhood network approach so that it can be applied to more complex systems. DOI:http://dx.doi.org/10.7554/eLife.03275.002
Collapse
Affiliation(s)
- Suwen Zhao
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, United States
| | - Ayano Sakai
- Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, United States
| | - Xinshuai Zhang
- Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, United States
| | - Matthew W Vetting
- Department of Biochemistry, Albert Einstein College of Medicine, New York, United States
| | - Ritesh Kumar
- Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, United States
| | - Brandan Hillerich
- Department of Biochemistry, Albert Einstein College of Medicine, New York, United States
| | - Brian San Francisco
- Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, United States
| | - Jose Solbiati
- Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, United States
| | - Adam Steves
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, United States
| | - Shoshana Brown
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, United States
| | - Eyal Akiva
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, United States
| | - Alan Barber
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, United States
| | - Ronald D Seidel
- Department of Biochemistry, Albert Einstein College of Medicine, New York, United States
| | - Patricia C Babbitt
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, United States
| | - Steven C Almo
- Department of Biochemistry, Albert Einstein College of Medicine, New York, United States
| | - John A Gerlt
- Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, United States
| | - Matthew P Jacobson
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, United States
| |
Collapse
|