1
|
Yeung W, Zhou Z, Mathew L, Gravel N, Taujale R, O’Boyle B, Salcedo M, Venkat A, Lanzilotta W, Li S, Kannan N. Tree visualizations of protein sequence embedding space enable improved functional clustering of diverse protein superfamilies. Brief Bioinform 2023; 24:bbac619. [PMID: 36642409 PMCID: PMC9851311 DOI: 10.1093/bib/bbac619] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Revised: 12/09/2022] [Accepted: 12/17/2022] [Indexed: 01/17/2023] Open
Abstract
Protein language models, trained on millions of biologically observed sequences, generate feature-rich numerical representations of protein sequences. These representations, called sequence embeddings, can infer structure-functional properties, despite protein language models being trained on primary sequence alone. While sequence embeddings have been applied toward tasks such as structure and function prediction, applications toward alignment-free sequence classification have been hindered by the lack of studies to derive, quantify and evaluate relationships between protein sequence embeddings. Here, we develop workflows and visualization methods for the classification of protein families using sequence embedding derived from protein language models. A benchmark of manifold visualization methods reveals that Neighbor Joining (NJ) embedding trees are highly effective in capturing global structure while achieving similar performance in capturing local structure compared with popular dimensionality reduction techniques such as t-SNE and UMAP. The statistical significance of hierarchical clusters on a tree is evaluated by resampling embeddings using a variational autoencoder (VAE). We demonstrate the application of our methods in the classification of two well-studied enzyme superfamilies, phosphatases and protein kinases. Our embedding-based classifications remain consistent with and extend upon previously published sequence alignment-based classifications. We also propose a new hierarchical classification for the S-Adenosyl-L-Methionine (SAM) enzyme superfamily which has been difficult to classify using traditional alignment-based approaches. Beyond applications in sequence classification, our results further suggest NJ trees are a promising general method for visualizing high-dimensional data sets.
Collapse
Affiliation(s)
- Wayland Yeung
- Institute of Bioinformatics, University of Georgia, 30602, Georgia, USA
| | - Zhongliang Zhou
- School of Computing, University of Georgia, 30602, Georgia, USA
| | - Liju Mathew
- Department of Microbiology, University of Georgia, 30602, Georgia, USA
| | - Nathan Gravel
- Institute of Bioinformatics, University of Georgia, 30602, Georgia, USA
| | - Rahil Taujale
- Institute of Bioinformatics, University of Georgia, 30602, Georgia, USA
| | - Brady O’Boyle
- Department of Biochemistry and Molecular Biology, University of Georgia, 30602, Georgia, USA
| | - Mariah Salcedo
- Department of Biochemistry and Molecular Biology, University of Georgia, 30602, Georgia, USA
| | - Aarya Venkat
- Department of Biochemistry and Molecular Biology, University of Georgia, 30602, Georgia, USA
| | - William Lanzilotta
- Department of Biochemistry and Molecular Biology, University of Georgia, 30602, Georgia, USA
| | - Sheng Li
- School of Data Science, University of Virginia, 22903, Virginia, USA
| | - Natarajan Kannan
- Institute of Bioinformatics, University of Georgia, 30602, Georgia, USA
- Department of Biochemistry and Molecular Biology, University of Georgia, 30602, Georgia, USA
| |
Collapse
|
2
|
Brimberry MA, Mathew L, Lanzilotta W. Making and breaking carbon-carbon bonds in class C radical SAM methyltransferases. J Inorg Biochem 2022; 226:111636. [PMID: 34717253 PMCID: PMC8667262 DOI: 10.1016/j.jinorgbio.2021.111636] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Revised: 10/07/2021] [Accepted: 10/12/2021] [Indexed: 01/03/2023]
Abstract
Radical S-adenosylmethionine (SAM) enzymes utilize a [4Fe-4S]1+ cluster and S-(5'-adenosyl)-L-methionine, (SAM), to generate a highly reactive radical and catalyze what is arguably the most diverse set of chemical reactions for any known enzyme family. At the heart of radical SAM catalysis is a highly reactive 5'-deoxyadenosyl radical intermediate (5'-dAdo●) generated through reductive cleavage of SAM or nucleophilic attack of the unique iron of the [4Fe-4S]+ cluster on the 5' C atom of SAM. Spectroscopic studies reveal the 5'-dAdo● is transiently captured in an FeC bond (Ω species). In the presence of substrate, homolytic scission of this metal‑carbon bond regenerates the 5'-dAdo● for catalytic hydrogen atom abstraction. While reminiscent of the adenosylcobalamin mechanism, radical SAM enzymes appear to encompass greater catalytic diversity. In this review we discuss recent developments for radical SAM enzymes involved in unique chemical rearrangements, specifically regarding class C radical SAM methyltransferases. Illuminating this class of radical SAM enzymes is especially significant as many enzymes have been shown to play critical roles in pathogenesis and the synthesis of novel antimicrobial compounds.
Collapse
Affiliation(s)
- Marley A. Brimberry
- Department of Biochemistry and Molecular Biology & Center for Metalloenzyme Studies,,Department of Chemistry University of Georgia, Athens GA 30602
| | - Liju Mathew
- Department of Biochemistry and Molecular Biology & Center for Metalloenzyme Studies,,Department of Chemistry University of Georgia, Athens GA 30602
| | - William Lanzilotta
- Department of Biochemistry and Molecular Biology & Center for Metalloenzyme Studies,,Department of Chemistry University of Georgia, Athens GA 30602.,To whom correspondence should be addressed. Phone, (706) 542-1324; fax, (706) 542-1738;
| |
Collapse
|