1
|
Smith MB, VanderVelden K, Blom T, Stout HD, Mapes JH, Folsom TM, Martin C, Bardo AM, Marcotte EM. Estimating error rates for single molecule protein sequencing experiments. PLoS Comput Biol 2024; 20:e1012258. [PMID: 38968291 PMCID: PMC11253918 DOI: 10.1371/journal.pcbi.1012258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 07/17/2024] [Accepted: 06/17/2024] [Indexed: 07/07/2024] Open
Abstract
The practical application of new single molecule protein sequencing (SMPS) technologies requires accurate estimates of their associated sequencing error rates. Here, we describe the development and application of two distinct parameter estimation methods for analyzing SMPS reads produced by fluorosequencing. A Hidden Markov Model (HMM) based approach, extends whatprot, where we previously used HMMs for SMPS peptide-read matching. This extension offers a principled approach for estimating key parameters for fluorosequencing experiments, including missed amino acid cleavages, dye loss, and peptide detachment. Specifically, we adapted the Baum-Welch algorithm, a standard technique to estimate transition probabilities for an HMM using expectation maximization, but modified here to estimate a small number of parameter values directly rather than estimating every transition probability independently. We demonstrate a high degree of accuracy on simulated data, but on experimental datasets, we observed that the model needed to be augmented with an additional error type, N-terminal blocking. This, in combination with data pre-processing, results in reasonable parameterizations of experimental datasets that agree with controlled experimental perturbations. A second independent implementation using a hybrid of DIRECT and Powell's method to reduce the root mean squared error (RMSE) between simulations and the real dataset was also developed. We compare these methods on both simulated and real data, finding that our Baum-Welch based approach outperforms DIRECT and Powell's method by most, but not all, criteria. Although some discrepancies between the results exist, we also find that both approaches provide similar error rate estimates from experimental single molecule fluorosequencing datasets.
Collapse
Affiliation(s)
- Matthew Beauregard Smith
- Oden Institute, The University of Texas at Austin, Austin, Texas, United States of America
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, Texas, United States of America
- Erisyon Inc., Austin Texas, United States of America
| | | | - Thomas Blom
- Erisyon Inc., Austin Texas, United States of America
| | - Heather D. Stout
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, Texas, United States of America
- Erisyon Inc., Austin Texas, United States of America
| | | | | | | | - Angela M. Bardo
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, Texas, United States of America
- Erisyon Inc., Austin Texas, United States of America
| | - Edward M. Marcotte
- Oden Institute, The University of Texas at Austin, Austin, Texas, United States of America
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, Texas, United States of America
| |
Collapse
|
2
|
Liu H, Deol H, Raeisbahrami A, Askari H, Wight CD, Lynch VM, Anslyn EV. A Method for Rigorously Selective Capture and Simultaneous Fluorescent Labeling of N-Terminal Glycine Peptides. J Am Chem Soc 2024; 146:13727-13732. [PMID: 38728661 DOI: 10.1021/jacs.4c04141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/12/2024]
Abstract
Although chemical methods for the selective derivatization of amino acid (AA) side chains in peptides and proteins are available, selective N-terminal labeling is challenging, especially for glycine, which has no side chain at the α-carbon position. We report here a double activation at glycine's α-methylene group that allows this AA to be differentiated from the other 19 AAs. A condensation reaction of dibenzoylmethane with glycine results in the formation of an imine, and subsequent tautomerization is followed by intramolecular cyclization, leading to the formation of a fluorescent pyrrole ring. Additionally, the approach exhibits compatibility with AAs possessing reactive side chains. Further, the method allows for selective pull-down assays of N-terminal glycine peptides from mixtures without prior knowledge of the N-terminal peptide distribution.
Collapse
Affiliation(s)
- Hongxu Liu
- College of Polymer Science and Engineering, State Key Laboratory of Polymer Materials Engineering, Sichuan University, Chengdu 610065, P. R. China
- Department of Chemistry, The University of Texas at Austin, Austin, Texas 78712, United States
| | - Harnimarta Deol
- Department of Chemistry, The University of Texas at Austin, Austin, Texas 78712, United States
| | - Ava Raeisbahrami
- Department of Chemistry, The University of Texas at Austin, Austin, Texas 78712, United States
| | - Hadis Askari
- Department of Chemistry, The University of Texas at Austin, Austin, Texas 78712, United States
| | - Christopher D Wight
- Department of Chemistry, The University of Texas at Austin, Austin, Texas 78712, United States
| | - Vincent M Lynch
- Department of Chemistry, The University of Texas at Austin, Austin, Texas 78712, United States
| | - Eric V Anslyn
- Department of Chemistry, The University of Texas at Austin, Austin, Texas 78712, United States
| |
Collapse
|
3
|
Emenike B, Czabala P, Farhi J, Swaminathan J, Anslyn EV, Spangle J, Raj M. Tertiary Amine Coupling by Oxidation for Selective Labeling of Dimethyl Lysine Post-Translational Modifications. J Am Chem Soc 2024; 146:10621-10631. [PMID: 38584362 PMCID: PMC11027136 DOI: 10.1021/jacs.4c00253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2024] [Revised: 03/22/2024] [Accepted: 03/25/2024] [Indexed: 04/09/2024]
Abstract
Lysine dimethylation (Kme2) is a crucial post-translational modification (PTM) that regulates biological processes and is implicated in diseases. There is significant interest in globally identifying these methylation marks. Unfortunately, this remains challenging due to the lack of robust technologies for selectively labeling Kme2. To address this, we present a chemical method named tertiary amine coupling by oxidation (TACO). This method selectively modifies Kme2 to aldehydes using Selectfluor and a base. The resulting aldehydes from Kme2 were then functionalized using reductive amination, thiolamine, and oxime chemistry. We successfully demonstrated the versatility of TACO in selectively labeling Kme2 peptides and proteins in complex cell lysate mixtures with varying payloads, including affinity tags and fluorophores. We further showed the application of TACO chemistry for the identification of Kme2 sites at a single-molecule level by fluorosequencing. We discovered novel 30 Kme2 sites, in addition to previously known 5 Kme2 sites, by proteomics analysis of TACO-modified nuclear extracts. Our work establishes a unique strategy for covalently modifying Kme2, facilitating the global identification of low-abundance Kme2-PTMs and their sites within complex cell lysate mixtures.
Collapse
Affiliation(s)
- Benjamin Emenike
- Department
of Chemistry, Emory University, Atlanta, Georgia 30322, United States
| | - Patrick Czabala
- Department
of Chemistry, Emory University, Atlanta, Georgia 30322, United States
| | - Jonathan Farhi
- Department
of Radiation Oncology, Emory University
School of Medicine, Atlanta, Georgia 30322, United States
| | - Jagannath Swaminathan
- Department
of Chemistry, University of Texas at Austin, Austin, Texas 78712, United States
| | - Eric V. Anslyn
- Department
of Chemistry, University of Texas at Austin, Austin, Texas 78712, United States
| | - Jennifer Spangle
- Department
of Radiation Oncology, Emory University
School of Medicine, Atlanta, Georgia 30322, United States
| | - Monika Raj
- Department
of Chemistry, Emory University, Atlanta, Georgia 30322, United States
| |
Collapse
|
4
|
Spears RJ, Chudasama V. Recent advances in N- and C-terminus cysteine protein bioconjugation. Curr Opin Chem Biol 2023; 75:102306. [PMID: 37236135 DOI: 10.1016/j.cbpa.2023.102306] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Revised: 03/12/2023] [Accepted: 03/20/2023] [Indexed: 05/28/2023]
Abstract
Advances in the site-specific chemical modification of proteins, also referred to as protein bioconjugation, have proved instrumental in revolutionary approaches to designing new protein-based therapeutics. Of the sites available for protein modification, cysteine residues or the termini of proteins have proved especially popular owing to their favorable properties for site-specific modification. Strategies that, therefore, specifically target cysteine at the termini offer a combination of these favorable properties of cysteine and termini bioconjugation. In this review, we discuss these strategies with a particular focus on those reported recently and provide our opinion on the future direction of the field.
Collapse
Affiliation(s)
- Richard J Spears
- Department of Chemistry, University College London, 20 Gordon Street, London, UK
| | - Vijay Chudasama
- Department of Chemistry, University College London, 20 Gordon Street, London, UK.
| |
Collapse
|
5
|
Smith MB, Simpson ZB, Marcotte EM. Amino acid sequence assignment from single molecule peptide sequencing data using a two-stage classifier. PLoS Comput Biol 2023; 19:e1011157. [PMID: 37253025 PMCID: PMC10256185 DOI: 10.1371/journal.pcbi.1011157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Revised: 06/09/2023] [Accepted: 05/04/2023] [Indexed: 06/01/2023] Open
Abstract
We present a machine learning-based interpretive framework (whatprot) for analyzing single molecule protein sequencing data produced by fluorosequencing, a recently developed proteomics technology that determines sparse amino acid sequences for many individual peptide molecules in a highly parallelized fashion. Whatprot uses Hidden Markov Models (HMMs) to represent the states of each peptide undergoing the various chemical processes during fluorosequencing, and applies these in a Bayesian classifier, in combination with pre-filtering by a k-Nearest Neighbors (kNN) classifier trained on large volumes of simulated fluorosequencing data. We have found that by combining the HMM based Bayesian classifier with the kNN pre-filter, we are able to retain the benefits of both, achieving both tractable runtimes and acceptable precision and recall for identifying peptides and their parent proteins from complex mixtures, outperforming the capabilities of either classifier on its own. Whatprot's hybrid kNN-HMM approach enables the efficient interpretation of fluorosequencing data using a full proteome reference database and should now also enable improved sequencing error rate estimates.
Collapse
Affiliation(s)
| | | | - Edward M. Marcotte
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, Texas, United States of America
| |
Collapse
|
6
|
Immel JR, Bloom S. carba-Nucleopeptides (cNPs): A Biopharmaceutical Modality Formed through Aqueous Rhodamine B Photoredox Catalysis. Angew Chem Int Ed Engl 2022; 61:e202205606. [PMID: 35507689 PMCID: PMC9256812 DOI: 10.1002/anie.202205606] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2022] [Indexed: 12/14/2022]
Abstract
Exchanging the ribose backbone of an oligonucleotide for a peptide can enhance its physiologic stability and nucleic acid binding affinity. Ordinarily, the eneamino nitrogen atom of a nucleobase is fused to the side chain of a polypeptide through a new C-N bond. The discovery of C-C linked nucleobases in the human transcriptome reveals new opportunities for engineering nucleopeptides that replace the traditional C-N bond with a non-classical C-C bond, liberating a captive nitrogen atom and promoting new hydrogen bonding and π-stacking interactions. We report the first late-stage synthesis of C-C linked carba-nucleopeptides (cNPs) using aqueous Rhodamine B photoredox catalysis. We prepare brand-new cNPs in batch, in parallel, and in flow using three long-wavelength photochemical setups. We detail the mechanism of our reaction by experimental and computational studies and highlight the essential role of diisopropylethylamine as a bifurcated two-electron reductant.
Collapse
Affiliation(s)
- Jacob R Immel
- Department of Medicinal Chemistry, University of Kansas, Lawrence, KS 66045, USA
| | - Steven Bloom
- Department of Medicinal Chemistry, University of Kansas, Lawrence, KS 66045, USA
| |
Collapse
|
7
|
Immel JR, Bloom S. carba
‐Nucleopeptides (
c
NPs): A Biopharmaceutical Modality Formed through Aqueous Rhodamine B Photoredox Catalysis. Angew Chem Int Ed Engl 2022. [DOI: 10.1002/ange.202205606] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Jacob R. Immel
- Department of Medicinal Chemistry University of Kansas Lawrence KS 66045 USA
| | - Steven Bloom
- Department of Medicinal Chemistry University of Kansas Lawrence KS 66045 USA
| |
Collapse
|