1
|
Smith MB, VanderVelden K, Blom T, Stout HD, Mapes JH, Folsom TM, Martin C, Bardo AM, Marcotte EM. Estimating error rates for single molecule protein sequencing experiments. PLoS Comput Biol 2024; 20:e1012258. [PMID: 38968291 PMCID: PMC11253918 DOI: 10.1371/journal.pcbi.1012258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 07/17/2024] [Accepted: 06/17/2024] [Indexed: 07/07/2024] Open
Abstract
The practical application of new single molecule protein sequencing (SMPS) technologies requires accurate estimates of their associated sequencing error rates. Here, we describe the development and application of two distinct parameter estimation methods for analyzing SMPS reads produced by fluorosequencing. A Hidden Markov Model (HMM) based approach, extends whatprot, where we previously used HMMs for SMPS peptide-read matching. This extension offers a principled approach for estimating key parameters for fluorosequencing experiments, including missed amino acid cleavages, dye loss, and peptide detachment. Specifically, we adapted the Baum-Welch algorithm, a standard technique to estimate transition probabilities for an HMM using expectation maximization, but modified here to estimate a small number of parameter values directly rather than estimating every transition probability independently. We demonstrate a high degree of accuracy on simulated data, but on experimental datasets, we observed that the model needed to be augmented with an additional error type, N-terminal blocking. This, in combination with data pre-processing, results in reasonable parameterizations of experimental datasets that agree with controlled experimental perturbations. A second independent implementation using a hybrid of DIRECT and Powell's method to reduce the root mean squared error (RMSE) between simulations and the real dataset was also developed. We compare these methods on both simulated and real data, finding that our Baum-Welch based approach outperforms DIRECT and Powell's method by most, but not all, criteria. Although some discrepancies between the results exist, we also find that both approaches provide similar error rate estimates from experimental single molecule fluorosequencing datasets.
Collapse
Affiliation(s)
- Matthew Beauregard Smith
- Oden Institute, The University of Texas at Austin, Austin, Texas, United States of America
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, Texas, United States of America
- Erisyon Inc., Austin Texas, United States of America
| | | | - Thomas Blom
- Erisyon Inc., Austin Texas, United States of America
| | - Heather D. Stout
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, Texas, United States of America
- Erisyon Inc., Austin Texas, United States of America
| | | | | | | | - Angela M. Bardo
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, Texas, United States of America
- Erisyon Inc., Austin Texas, United States of America
| | - Edward M. Marcotte
- Oden Institute, The University of Texas at Austin, Austin, Texas, United States of America
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, Texas, United States of America
| |
Collapse
|
2
|
Lan T, Dong Y, Jiang L, Zhang Y, Sui X. Analytical approaches for assessing protein structure in protein-rich food: A comprehensive review. Food Chem X 2024; 22:101365. [PMID: 38623506 PMCID: PMC11016869 DOI: 10.1016/j.fochx.2024.101365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Revised: 03/24/2024] [Accepted: 04/05/2024] [Indexed: 04/17/2024] Open
Abstract
This review focuses on changes in nutrition and functional properties of protein-rich foods, primarily attributed to alterations in protein structures. We provide a comprehensive overview and comparison of commonly used laboratory methods for protein structure identification, aiming to offer readers a convenient understanding of these techniques. The review covers a range of detection technologies employed in food protein analysis and conducts an extensive comparison to identify the most suitable method for various proteins. While these techniques offer distinct advantages for protein structure determination, the inherent complexity of food matrices presents ongoing challenges. Further research is necessary to develop and enhance more robust detection methods to improve accuracy in protein conformation and structure analysis.
Collapse
Affiliation(s)
- Tian Lan
- College of Food Science, Northeast Agricultural University, Harbin 150030, China
| | - Yabo Dong
- College of Food Science, Northeast Agricultural University, Harbin 150030, China
| | - Lianzhou Jiang
- College of Food Science, Northeast Agricultural University, Harbin 150030, China
| | - Yan Zhang
- College of Horticulture and Landscape Architecture, Northeast Agricultural University, Harbin 150030, China
| | - Xiaonan Sui
- College of Food Science, Northeast Agricultural University, Harbin 150030, China
| |
Collapse
|
3
|
Peyretaillade E, Akossi RF, Tournayre J, Delbac F, Wawrzyniak I. How to overcome constraints imposed by microsporidian genome features to ensure gene prediction? J Eukaryot Microbiol 2024:e13038. [PMID: 38934348 DOI: 10.1111/jeu.13038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Revised: 06/03/2024] [Accepted: 06/10/2024] [Indexed: 06/28/2024]
Abstract
Since the advent of sequencing techniques and due to their continuous evolution, it has become easier and less expensive to obtain the complete genome sequence of any organism. Nevertheless, to elucidate all biological processes governing organism development, quality annotation is essential. In genome annotation, predicting gene structure is one of the most important and captivating challenges for computational biology. This aspect of annotation requires continual optimization, particularly for genomes as unusual as those of microsporidia. Indeed, this group of fungal-related parasites exhibits specific features (highly reduced gene sizes, sequences with high rate of evolution) linked to their evolution as intracellular parasites, requiring the implementation of specific annotation approaches to consider all these features. This review aimed to outline these characteristics and to assess the increasingly efficient approaches and tools that have enhanced the accuracy of gene prediction for microsporidia, both in terms of sensitivity and specificity. Subsequently, a final part will be dedicated to postgenomic approaches aimed at reinforcing the annotation data generated by prediction software. These approaches include the characterization of other understudied genes, such as those encoding regulatory noncoding RNAs or very small proteins, which also play crucial roles in the life cycle of these microorganisms.
Collapse
Affiliation(s)
| | - Reginal F Akossi
- LMGE, CNRS, Université Clermont Auvergne, Clermont-Ferrand, France
| | - Jérémy Tournayre
- INRAE, UMR Herbivores, Université Clermont Auvergne, VetAgro Sup, Saint-Genès-Champanelle, France
| | - Frédéric Delbac
- LMGE, CNRS, Université Clermont Auvergne, Clermont-Ferrand, France
| | - Ivan Wawrzyniak
- LMGE, CNRS, Université Clermont Auvergne, Clermont-Ferrand, France
| |
Collapse
|
4
|
Penedo JC. Topographic fingerprinting of single proteins and proteoforms. NATURE NANOTECHNOLOGY 2024; 19:580-581. [PMID: 38528111 DOI: 10.1038/s41565-024-01638-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/27/2024]
Affiliation(s)
- J Carlos Penedo
- Centre of Biophotonics, Laboratory for Biophysics and Biomolecular Dynamics, School of Physics and Astronomy and School of Biology, University of St. Andrews, St Andrews, UK.
| |
Collapse
|
5
|
Filius M, van Wee R, de Lannoy C, Westerlaken I, Li Z, Kim SH, de Agrela Pinto C, Wu Y, Boons GJ, Pabst M, de Ridder D, Joo C. Full-length single-molecule protein fingerprinting. NATURE NANOTECHNOLOGY 2024; 19:652-659. [PMID: 38351230 DOI: 10.1038/s41565-023-01598-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Accepted: 12/22/2023] [Indexed: 03/21/2024]
Abstract
Proteins are the primary functional actors of the cell. While proteoform diversity is known to be highly biologically relevant, current protein analysis methods are of limited use for distinguishing proteoforms. Mass spectrometric methods, in particular, often provide only ambiguous information on post-translational modification sites, and sequences of co-existing modifications may not be resolved. Here we demonstrate fluorescence resonance energy transfer (FRET)-based single-molecule protein fingerprinting to map the location of individual amino acids and post-translational modifications within single full-length protein molecules. Our data show that both intrinsically disordered proteins and folded globular proteins can be fingerprinted with a subnanometer resolution, achieved by probing the amino acids one by one using single-molecule FRET via DNA exchange. This capability was demonstrated through the analysis of alpha-synuclein, an intrinsically disordered protein, by accurately quantifying isoforms in mixtures using a machine learning classifier, and by determining the locations of two O-GlcNAc moieties. Furthermore, we demonstrate fingerprinting of the globular proteins Bcl-2-like protein 1, procalcitonin and S100A9. We anticipate that our ability to perform proteoform identification with the ultimate sensitivity may unlock exciting new venues in proteomics research and biomarker-based diagnosis.
Collapse
Affiliation(s)
- Mike Filius
- Department of Bionanoscience, Kavli Institute of Nanoscience, Delft University of Technology, Delft, The Netherlands
| | - Raman van Wee
- Department of Bionanoscience, Kavli Institute of Nanoscience, Delft University of Technology, Delft, The Netherlands
| | - Carlos de Lannoy
- Department of Bionanoscience, Kavli Institute of Nanoscience, Delft University of Technology, Delft, The Netherlands
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
| | - Ilja Westerlaken
- Department of Bionanoscience, Kavli Institute of Nanoscience, Delft University of Technology, Delft, The Netherlands
| | - Zeshi Li
- Department of Bionanoscience, Kavli Institute of Nanoscience, Delft University of Technology, Delft, The Netherlands
| | - Sung Hyun Kim
- Department of Bionanoscience, Kavli Institute of Nanoscience, Delft University of Technology, Delft, The Netherlands
- Department of Physics, Ewha Womans University, Seoul, Republic of Korea
| | - Cecilia de Agrela Pinto
- Department of Bionanoscience, Kavli Institute of Nanoscience, Delft University of Technology, Delft, The Netherlands
| | - Yunfei Wu
- Department of Chemical Biology and Drug Discovery, Utrecht Institute for Pharmaceutical Sciences, Utrecht University, Utrecht, The Netherlands
| | - Geert-Jan Boons
- Department of Chemical Biology and Drug Discovery, Utrecht Institute for Pharmaceutical Sciences, Utrecht University, Utrecht, The Netherlands
- Bijvoet Center for Biomolecular Research, Utrecht University, Utrecht, The Netherlands
| | - Martin Pabst
- Department of Biotechnology, Delft University of Technology, Delft, The Netherlands
| | - Dick de Ridder
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
| | - Chirlmin Joo
- Department of Bionanoscience, Kavli Institute of Nanoscience, Delft University of Technology, Delft, The Netherlands.
- Department of Physics, Ewha Womans University, Seoul, Republic of Korea.
| |
Collapse
|
6
|
Wacholder A, Carvunis AR. Biological factors and statistical limitations prevent detection of most noncanonical proteins by mass spectrometry. PLoS Biol 2023; 21:e3002409. [PMID: 38048358 PMCID: PMC10721188 DOI: 10.1371/journal.pbio.3002409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Revised: 12/14/2023] [Accepted: 10/30/2023] [Indexed: 12/06/2023] Open
Abstract
Ribosome profiling experiments indicate pervasive translation of short open reading frames (ORFs) outside of annotated protein-coding genes. However, shotgun mass spectrometry (MS) experiments typically detect only a small fraction of the predicted protein products of this noncanonical translation. The rarity of detection could indicate that most predicted noncanonical proteins are rapidly degraded and not present in the cell; alternatively, it could reflect technical limitations. Here, we leveraged recent advances in ribosome profiling and MS to investigate the factors limiting detection of noncanonical proteins in yeast. We show that the low detection rate of noncanonical ORF products can largely be explained by small size and low translation levels and does not indicate that they are unstable or biologically insignificant. In particular, proteins encoded by evolutionarily young genes, including those with well-characterized biological roles, are too short and too lowly expressed to be detected by shotgun MS at current detection sensitivities. Additionally, we find that decoy biases can give misleading estimates of noncanonical protein false discovery rates, potentially leading to false detections. After accounting for these issues, we found strong evidence for 4 noncanonical proteins in MS data, which were also supported by evolution and translation data. These results illustrate the power of MS to validate unannotated genes predicted by ribosome profiling, but also its substantial limitations in finding many biologically relevant lowly expressed proteins.
Collapse
Affiliation(s)
- Aaron Wacholder
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Anne-Ruxandra Carvunis
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| |
Collapse
|
7
|
Wacholder A, Carvunis AR. Biological Factors and Statistical Limitations Prevent Detection of Most Noncanonical Proteins by Mass Spectrometry. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.09.531963. [PMID: 36945638 PMCID: PMC10028962 DOI: 10.1101/2023.03.09.531963] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/14/2023]
Abstract
Ribosome profiling experiments indicate pervasive translation of short open reading frames (ORFs) outside of annotated protein-coding genes. However, shotgun mass spectrometry experiments typically detect only a small fraction of the predicted protein products of this noncanonical translation. The rarity of detection could indicate that most predicted noncanonical proteins are rapidly degraded and not present in the cell; alternatively, it could reflect technical limitations. Here we leveraged recent advances in ribosome profiling and mass spectrometry to investigate the factors limiting detection of noncanonical proteins in yeast. We show that the low detection rate of noncanonical ORF products can largely be explained by small size and low translation levels and does not indicate that they are unstable or biologically insignificant. In particular, proteins encoded by evolutionarily young genes, including those with well-characterized biological roles, are too short and too lowly-expressed to be detected by shotgun mass spectrometry at current detection sensitivities. Additionally, we find that decoy biases can give misleading estimates of noncanonical protein false discovery rates, potentially leading to false detections. After accounting for these issues, we found strong evidence for four noncanonical proteins in mass spectrometry data, which were also supported by evolution and translation data. These results illustrate the power of mass spectrometry to validate unannotated genes predicted by ribosome profiling, but also its substantial limitations in finding many biologically relevant lowly-expressed proteins.
Collapse
|
8
|
Mapes JH, Stover J, Stout HD, Folsom TM, Babcock E, Loudwig S, Martin C, Austin MJ, Tu F, Howdieshell CJ, Simpson ZB, Blom T, Weaver D, Winkler D, Vander Velden K, Ossareh PM, Beierle JM, Somekh T, Bardo AM, Anslyn EV, Marcotte EM, Swaminathan J. Robust and scalable single-molecule protein sequencing with fluorosequencing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.15.558007. [PMID: 37745461 PMCID: PMC10516020 DOI: 10.1101/2023.09.15.558007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
The need to accurately survey proteins and their modifications with ever higher sensitivities, particularly in clinical settings with limited samples, is spurring development of new single molecule proteomics technologies. Fluorosequencing is one such highly parallelized single molecule peptide sequencing platform, based on determining the sequence positions of select amino acid types within peptides to enable their identification and quantification from a reference database. Here, we describe substantial improvements to fluorosequencing, including identifying fluorophores compatible with the sequencing chemistry, mitigating dye-dye interactions through the use of extended polyproline linkers, and developing an end-to-end workflow for sample preparation and sequencing. We demonstrate by fluorosequencing peptides in mixtures and identifying a target neoantigen from a database of decoy MHC peptides, highlighting the potential of the technology for high sensitivity clinical applications.
Collapse
Affiliation(s)
| | | | - Heather D Stout
- Erisyon, Inc. Austin, TX, 78752
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX 78712
| | | | | | | | - Christopher Martin
- Erisyon, Inc. Austin, TX, 78752
- Department of Chemistry, The University of Texas at Austin, Austin, TX 78712
| | | | - Fan Tu
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX 78712
| | | | | | | | | | | | | | | | | | | | - Angela M Bardo
- Erisyon, Inc. Austin, TX, 78752
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX 78712
| | - Eric V Anslyn
- Department of Chemistry, The University of Texas at Austin, Austin, TX 78712
| | - Edward M Marcotte
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX 78712
| | - Jagannath Swaminathan
- Erisyon, Inc. Austin, TX, 78752
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX 78712
| |
Collapse
|
9
|
Locatelli E, Bianco V, Valeriani C, Malgaretti P. Nonmonotonous Translocation Time of Polymers across Pores. PHYSICAL REVIEW LETTERS 2023; 131:048101. [PMID: 37566871 DOI: 10.1103/physrevlett.131.048101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Accepted: 07/06/2023] [Indexed: 08/13/2023]
Abstract
Polymers confined in corrugated channels, i.e., channels of varying amplitude, display multiple local maxima and minima of the diffusion coefficient upon increasing their degree of polymerization N. We propose a theoretical effective free energy for linear polymers based on a Fick-Jacobs approach. We validate the predictions against numerical data, obtaining quantitative agreement for the effective free energy, the diffusion coefficient, and the mean first passage time. Finally, we employ the effective free energy to compute the polymer lengths N_{min} at which the diffusion coefficient presents a minimum: we find a scaling expression that we rationalize with a blob model. Our results could be useful to design porous adsorbers, that separate polymers of different sizes without the action of an external flow.
Collapse
Affiliation(s)
- Emanuele Locatelli
- Dipartimento di Fisica e Astronomia, Università di Padova, via Marzolo 8, I-35131 Padova, Italy
- INFN, Sezione di Padova, via Marzolo 8, I-35131 Padova, Italy
| | - Valentino Bianco
- Faculty of Chemistry, Chemical Physics Department, Universidad Complutense de Madrid, 28040 Madrid, Spain
| | - Chantal Valeriani
- Departamento de Estructura de la Materia, Física Termica y Electronica, Facultad de Ciencias Físicas, Universidad Complutense de Madrid, 28040 Madrid, Spain
| | - Paolo Malgaretti
- Helmholtz Institut Erlangen-Nürnberg for Renewable Energy (IEK-11), Forschungszentrum Jülich, Cauer Strasse 1, 91058, Erlangen, Germany
| |
Collapse
|
10
|
Smith MB, VanderVelden K, Blom T, Stout HD, Mapes JH, Folsom TM, Martin C, Bardo AM, Marcotte EM. Estimating error rates for single molecule protein sequencing experiments. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.18.549591. [PMID: 37502879 PMCID: PMC10370102 DOI: 10.1101/2023.07.18.549591] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
The practical application of new single molecule protein sequencing (SMPS) technologies requires accurate estimates of their associated sequencing error rates. Here, we describe the development and application of two distinct parameter estimation methods for analyzing SMPS reads produced by fluorosequencing. A Hidden Markov Model (HMM) based approach, extends whatprot, where we previously used HMMs for SMPS peptide-read matching. This extension offers a principled approach for estimating key parameters for fluorosequencing experiments, including missed amino acid cleavages, dye loss, and peptide detachment. Specifically, we adapted the Baum-Welch algorithm, a standard technique to estimate transition probabilities for an HMM using expectation maximization, but modified here to estimate a small number of parameter values directly rather than estimating every transition probability independently, which should help prevent overfitting. We demonstrate a high degree of accuracy on simulated data, but on experimental datasets, we observed that the model needed to be augmented with an additional error type, N-terminal blocking. This, in combination with data pre-processing, results in reasonable parameterizations of experimental datasets that agree with controlled experimental perturbations. A second independent implementation using a hybrid of DIRECT and Powell's method to reduce the root mean squared error (RMSE) between simulations and the real dataset was also developed. We compare these methods on both simulated and real data, finding that our Baum-Welch based approach outperforms DIRECT and Powell's method by most, but not all, criteria. Although some discrepancies between the results exist, we also find that both approaches provide similar error rate estimates from experimental single molecule fluorosequencing datasets.
Collapse
Affiliation(s)
- Matthew Beauregard Smith
- Oden Institute, The University of Texas at Austin, Austin, TX 78712
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX 78712
| | | | | | - Heather D Stout
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX 78712
- Erisyon Inc., Austin TX 78752
| | | | | | | | - Angela M Bardo
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX 78712
- Erisyon Inc., Austin TX 78752
| | - Edward M Marcotte
- Oden Institute, The University of Texas at Austin, Austin, TX 78712
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX 78712
| |
Collapse
|
11
|
Liu M, Li J, Tan CS. Unlocking the Power of Nanopores: Recent Advances in Biosensing Applications and Analog Front-End. BIOSENSORS 2023; 13:598. [PMID: 37366963 DOI: 10.3390/bios13060598] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Revised: 05/24/2023] [Accepted: 05/29/2023] [Indexed: 06/28/2023]
Abstract
The biomedical field has always fostered innovation and the development of various new technologies. Beginning in the last century, demand for picoampere-level current detection in biomedicine has increased, leading to continuous breakthroughs in biosensor technology. Among emerging biomedical sensing technologies, nanopore sensing has shown great potential. This paper reviews nanopore sensing applications, such as chiral molecules, DNA sequencing, and protein sequencing. However, the ionic current for different molecules differs significantly, and the detection bandwidths vary as well. Therefore, this article focuses on current sensing circuits, and introduces the latest design schemes and circuit structures of different feedback components of transimpedance amplifiers mainly used in nanopore DNA sequencing.
Collapse
Affiliation(s)
- Miao Liu
- Medical College, Tianjin University, Tianjin 300072, China
| | - Junyang Li
- Medical College, Tianjin University, Tianjin 300072, China
| | - Cherie S Tan
- Medical College, Tianjin University, Tianjin 300072, China
| |
Collapse
|
12
|
Smith MB, Simpson ZB, Marcotte EM. Amino acid sequence assignment from single molecule peptide sequencing data using a two-stage classifier. PLoS Comput Biol 2023; 19:e1011157. [PMID: 37253025 PMCID: PMC10256185 DOI: 10.1371/journal.pcbi.1011157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Revised: 06/09/2023] [Accepted: 05/04/2023] [Indexed: 06/01/2023] Open
Abstract
We present a machine learning-based interpretive framework (whatprot) for analyzing single molecule protein sequencing data produced by fluorosequencing, a recently developed proteomics technology that determines sparse amino acid sequences for many individual peptide molecules in a highly parallelized fashion. Whatprot uses Hidden Markov Models (HMMs) to represent the states of each peptide undergoing the various chemical processes during fluorosequencing, and applies these in a Bayesian classifier, in combination with pre-filtering by a k-Nearest Neighbors (kNN) classifier trained on large volumes of simulated fluorosequencing data. We have found that by combining the HMM based Bayesian classifier with the kNN pre-filter, we are able to retain the benefits of both, achieving both tractable runtimes and acceptable precision and recall for identifying peptides and their parent proteins from complex mixtures, outperforming the capabilities of either classifier on its own. Whatprot's hybrid kNN-HMM approach enables the efficient interpretation of fluorosequencing data using a full proteome reference database and should now also enable improved sequencing error rate estimates.
Collapse
Affiliation(s)
| | | | - Edward M. Marcotte
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, Texas, United States of America
| |
Collapse
|
13
|
Mansuri MS, Williams K, Nairn AC. Uncovering biology by single-cell proteomics. Commun Biol 2023; 6:381. [PMID: 37031277 PMCID: PMC10082756 DOI: 10.1038/s42003-023-04635-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Accepted: 02/25/2023] [Indexed: 04/10/2023] Open
Abstract
Recent technological advances have opened the door to single-cell proteomics that can answer key biological questions regarding how protein expression, post-translational modifications, and protein interactions dictate cell state in health and disease.
Collapse
Affiliation(s)
- M Shahid Mansuri
- Yale/NIDA Neuroproteomics Center and Department of Molecular Biophysics and Biochemistry, Yale School of Medicine, New Haven, Connecticut, USA
| | - Kenneth Williams
- Yale/NIDA Neuroproteomics Center and Department of Molecular Biophysics and Biochemistry, Yale School of Medicine, New Haven, Connecticut, USA
| | - Angus C Nairn
- Yale/NIDA Neuroproteomics Center and Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut, USA.
| |
Collapse
|
14
|
Liang L, Qin F, Wang S, Wu J, Li R, Wang Z, Ren M, Liu D, Wang D, Astruc D. Overview of the materials design and sensing strategies of nanopore devices. Coord Chem Rev 2023. [DOI: 10.1016/j.ccr.2022.214998] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
|
15
|
He H, Wu C, Saqib M, Hao R. Single-molecule fluorescence methods for protein biomarker analysis. Anal Bioanal Chem 2023:10.1007/s00216-022-04502-9. [PMID: 36609860 DOI: 10.1007/s00216-022-04502-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 12/07/2022] [Accepted: 12/20/2022] [Indexed: 01/09/2023]
Abstract
Proteins have been considered key building blocks of life. In particular, the protein content of an organism and a cell offers significant information for the in-depth understanding of the disease and biological processes. Single-molecule protein detection/sequencing tools will revolutionize clinical (proteomics) research, offering ultrasensitivity for low-abundance biomarker (protein) detection, which is important for the realization of early-stage disease diagnosis and single-cell proteomics. This improved detection/measurement capability delivers new sets of techniques to explore new frontiers and address important challenges in various interdisciplinary areas including nanostructured materials, molecular medicine, molecular biology, and chemistry. Importantly, fluorescence-based methods have emerged as indispensable tools for single protein detection/sequencing studies, providing a higher signal-to-noise ratio (SNR). Improvements in fluorescent dyes/probes and detector capabilities coupled with advanced (image) analysis strategies have fueled current developments for single protein biomarker detections. For example, in comparison to conventional ELISA (i.e., based on ensembled measurements), single-molecule fluorescence detection is more sensitive, faster, and more accurate with reduced background, high-throughput, and so on. In comparison to MS sequencing, fluorescence-based single-molecule protein sequencing can achieve the sequencing of peptides themselves with higher sensitivity. This review summarizes various typical single-molecule detection technologies including their methodology (modes of operation), detection limits, advantages and drawbacks, and current challenges with recent examples. We describe the fluorescence-based single-molecule protein sequencing/detection based on five kinds of technologies such as fluorosequencing, N-terminal amino acid binder, nanopore light sensing, and DNA nanotechnology. Finally, we present our perspective for developing high-performance fluorescence-based sequencing/detection techniques.
Collapse
Affiliation(s)
- Haihan He
- Department of Chemistry, Southern University of Science and Technology, Shenzhen, 518055, China.,Research Center for Chemical Biology and Omics Analysis, School of Science, Southern University of Science and Technology, Shenzhen, 518055, China
| | - Chuhong Wu
- Department of Chemistry, Southern University of Science and Technology, Shenzhen, 518055, China.,Research Center for Chemical Biology and Omics Analysis, School of Science, Southern University of Science and Technology, Shenzhen, 518055, China
| | - Muhammad Saqib
- Department of Chemistry, Southern University of Science and Technology, Shenzhen, 518055, China.,Research Center for Chemical Biology and Omics Analysis, School of Science, Southern University of Science and Technology, Shenzhen, 518055, China.,Institute of Chemistry, Khwaja Fareed University of Engineering & Information Technology, Rahim Yar Khan 64200, Pakistan
| | - Rui Hao
- Department of Chemistry, Southern University of Science and Technology, Shenzhen, 518055, China. .,Research Center for Chemical Biology and Omics Analysis, School of Science, Southern University of Science and Technology, Shenzhen, 518055, China.
| |
Collapse
|
16
|
He X, Liu X, Zuo F, Shi H, Jing J. Artificial intelligence-based multi-omics analysis fuels cancer precision medicine. Semin Cancer Biol 2023; 88:187-200. [PMID: 36596352 DOI: 10.1016/j.semcancer.2022.12.009] [Citation(s) in RCA: 35] [Impact Index Per Article: 35.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Revised: 12/16/2022] [Accepted: 12/29/2022] [Indexed: 01/02/2023]
Abstract
With biotechnological advancements, innovative omics technologies are constantly emerging that have enabled researchers to access multi-layer information from the genome, epigenome, transcriptome, proteome, metabolome, and more. A wealth of omics technologies, including bulk and single-cell omics approaches, have empowered to characterize different molecular layers at unprecedented scale and resolution, providing a holistic view of tumor behavior. Multi-omics analysis allows systematic interrogation of various molecular information at each biological layer while posing tricky challenges regarding how to extract valuable insights from the exponentially increasing amount of multi-omics data. Therefore, efficient algorithms are needed to reduce the dimensionality of the data while simultaneously dissecting the mysteries behind the complex biological processes of cancer. Artificial intelligence has demonstrated the ability to analyze complementary multi-modal data streams within the oncology realm. The coincident development of multi-omics technologies and artificial intelligence algorithms has fuelled the development of cancer precision medicine. Here, we present state-of-the-art omics technologies and outline a roadmap of multi-omics integration analysis using an artificial intelligence strategy. The advances made using artificial intelligence-based multi-omics approaches are described, especially concerning early cancer screening, diagnosis, response assessment, and prognosis prediction. Finally, we discuss the challenges faced in multi-omics analysis, along with tentative future trends in this field. With the increasing application of artificial intelligence in multi-omics analysis, we anticipate a shifting paradigm in precision medicine becoming driven by artificial intelligence-based multi-omics technologies.
Collapse
Affiliation(s)
- Xiujing He
- Laboratory of Integrative Medicine, Clinical Research Center for Breast, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University and Collaborative Innovation Center, Chengdu, Sichuan, PR China
| | - Xiaowei Liu
- Laboratory of Integrative Medicine, Clinical Research Center for Breast, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University and Collaborative Innovation Center, Chengdu, Sichuan, PR China
| | - Fengli Zuo
- Laboratory of Integrative Medicine, Clinical Research Center for Breast, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University and Collaborative Innovation Center, Chengdu, Sichuan, PR China
| | - Hubing Shi
- Laboratory of Integrative Medicine, Clinical Research Center for Breast, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University and Collaborative Innovation Center, Chengdu, Sichuan, PR China
| | - Jing Jing
- Laboratory of Integrative Medicine, Clinical Research Center for Breast, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University and Collaborative Innovation Center, Chengdu, Sichuan, PR China.
| |
Collapse
|