1
|
Parmar BS, Peeters MKR, Boonen K, Clark EC, Baggerman G, Menschaert G, Temmerman L. Identification of Non-Canonical Translation Products in C. elegans Using Tandem Mass Spectrometry. Front Genet 2021; 12:728900. [PMID: 34759956 PMCID: PMC8575065 DOI: 10.3389/fgene.2021.728900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Accepted: 09/16/2021] [Indexed: 11/22/2022] Open
Abstract
Transcriptome and ribosome sequencing have revealed the existence of many non-canonical transcripts, mainly containing splice variants, ncRNA, sORFs and altORFs. However, identification and characterization of products that may be translated out of these remains a challenge. Addressing this, we here report on 552 non-canonical proteins and splice variants in the model organism C. elegans using tandem mass spectrometry. Aided by sequencing-based prediction, we generated a custom proteome database tailored to search for non-canonical translation products of C. elegans. Using this database, we mined available mass spectrometric resources of C. elegans, from which 51 novel, non-canonical proteins could be identified. Furthermore, we utilized diverse proteomic and peptidomic strategies to detect 40 novel non-canonical proteins in C. elegans by LC-TIMS-MS/MS, of which 6 were common with our meta-analysis of existing resources. Together, this permits us to provide a resource with detailed annotation of 467 splice variants and 85 novel proteins mapped onto UTRs, non-coding regions and alternative open reading frames of the C. elegans genome.
Collapse
Affiliation(s)
- Bhavesh S. Parmar
- Animal Physiology and Neurobiology, University of Leuven (KU Leuven), Leuven, Belgium
| | - Marlies K. R. Peeters
- Laboratory of Bioinformatics and Computational Genomics (BioBix), Department of Mathematical Modelling, Ghent University, Ghent, Belgium
| | - Kurt Boonen
- Centre for Proteomics (CFP), University of Antwerp, Antwerp, Belgium
| | - Ellie C. Clark
- Animal Physiology and Neurobiology, University of Leuven (KU Leuven), Leuven, Belgium
| | - Geert Baggerman
- Centre for Proteomics (CFP), University of Antwerp, Antwerp, Belgium
| | - Gerben Menschaert
- Laboratory of Bioinformatics and Computational Genomics (BioBix), Department of Mathematical Modelling, Ghent University, Ghent, Belgium
| | - Liesbet Temmerman
- Animal Physiology and Neurobiology, University of Leuven (KU Leuven), Leuven, Belgium
| |
Collapse
|
2
|
Vitorino R, Choudhury M, Guedes S, Ferreira R, Thongboonkerd V, Sharma L, Amado F, Srivastava S. Peptidomics and proteogenomics: background, challenges and future needs. Expert Rev Proteomics 2021; 18:643-659. [PMID: 34517741 DOI: 10.1080/14789450.2021.1980388] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
INTRODUCTION With available genomic data and related information, it is becoming possible to better highlight mutations or genomic alterations associated with a particular disease or disorder. The advent of high-throughput sequencing technologies has greatly advanced diagnostics, prognostics, and drug development. AREAS COVERED Peptidomics and proteogenomics are the two post-genomic technologies that enable the simultaneous study of peptides and proteins/transcripts/genes. Both technologies add a remarkably large amount of data to the pool of information on various peptides associated with gene mutations or genome remodeling. Literature search was performed in the PubMed database and is up to date. EXPERT OPINION This article lists various techniques used for peptidomic and proteogenomic analyses. It also explains various bioinformatics workflows developed to understand differentially expressed peptides/proteins and their role in disease pathogenesis. Their role in deciphering disease pathways, cancer research, and biomarker discovery using biofluids is highlighted. Finally, the challenges and future requirements to overcome the current limitations for their effective clinical use are also discussed.
Collapse
Affiliation(s)
- Rui Vitorino
- Faculdade de Medicina da Universidade do Porto, Porto, Portugal.,iBiMED, Department of Medical Sciences, University of Aveiro, Aveiro, Portugal.,Laqv/requimte, Department of Chemistry, University of Aveiro, Aveiro, Portugal
| | - Manisha Choudhury
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Mumbai, Powai, India
| | - Sofia Guedes
- Laqv/requimte, Department of Chemistry, University of Aveiro, Aveiro, Portugal
| | - Rita Ferreira
- Laqv/requimte, Department of Chemistry, University of Aveiro, Aveiro, Portugal
| | - Visith Thongboonkerd
- Medical Proteomics Unit, Office for Research and Development, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, Thailand
| | | | - Francisco Amado
- Laqv/requimte, Department of Chemistry, University of Aveiro, Aveiro, Portugal
| | - Sanjeeva Srivastava
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Mumbai, Powai, India
| |
Collapse
|
3
|
Chen W, Liu X. Proteoform Identification by Combining RNA-Seq and Top-Down Mass Spectrometry. J Proteome Res 2020; 20:261-269. [PMID: 33183009 DOI: 10.1021/acs.jproteome.0c00369] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
In proteogenomic studies, genomic and transcriptomic variants are incorporated into customized protein databases for the identification of proteoforms, especially proteoforms with sample-specific variants. Most proteogenomic research has been focused on combining genomic or transcriptomic data with bottom-up mass spectrometry data. In the last decade, top-down mass spectrometry has attracted increasing attention because of its capacity to identify various proteoforms with alterations. However, top-down proteogenomics, in which genomic or transcriptomic data are combined with top-down mass spectrometry data, has not been widely adopted, and there is still a lack of software tools for top-down proteogenomic data analysis. In this paper, we introduce TopPG, a proteogenomic tool for generating proteoform sequence databases with genetic alterations and alternative splicing events. Experiments on top-down proteogenomic data of DLD-1 colorectal cancer cells showed that TopPG coupled with database search confidently identified proteoforms with sample-specific alterations.
Collapse
Affiliation(s)
- Wenrong Chen
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, Indiana 46202, United States
| | - Xiaowen Liu
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, Indiana 46202, United States.,Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana 46202, United States
| |
Collapse
|
4
|
Zhong J, Sun Y, Xie M, Peng W, Zhang C, Wu FX, Wang J. Proteoform characterization based on top-down mass spectrometry. Brief Bioinform 2020; 22:1729-1750. [PMID: 32118252 DOI: 10.1093/bib/bbaa015] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2019] [Revised: 01/23/2020] [Indexed: 12/16/2022] Open
Abstract
Proteins are dominant executors of living processes. Compared to genetic variations, changes in the molecular structure and state of a protein (i.e. proteoforms) are more directly related to pathological changes in diseases. Characterizing proteoforms involves identifying and locating primary structure alterations (PSAs) in proteoforms, which is of practical importance for the advancement of the medical profession. With the development of mass spectrometry (MS) technology, the characterization of proteoforms based on top-down MS technology has become possible. This type of method is relatively new and faces many challenges. Since the proteoform identification is the most important process in characterizing proteoforms, we comprehensively review the existing proteoform identification methods in this study. Before identifying proteoforms, the spectra need to be preprocessed, and protein sequence databases can be filtered to speed up the identification. Therefore, we also summarize some popular deconvolution algorithms, various filtering algorithms for improving the proteoform identification performance and various scoring methods for localizing proteoforms. Moreover, commonly used methods were evaluated and compared in this review. We believe our review could help researchers better understand the current state of the development in this field and design new efficient algorithms for the proteoform characterization.
Collapse
Affiliation(s)
- Jiancheng Zhong
- College of Information Science and Engineering, Hunan Normal University, Changsha, Hunan, China
| | - Yusui Sun
- College of Information Science and Engineering, Hunan Normal University, Changsha, Hunan, China
| | - Minzhu Xie
- College of Information Science and Engineering, Hunan Normal University, Changsha, Hunan, China
| | - Wei Peng
- Kunming University of Science and Technology, Kunming, Yunnan, China
| | - Chushu Zhang
- College of Information Science and Engineering, Hunan Normal University, Changsha, Hunan, China
| | - Fang-Xiang Wu
- College of Engineering and the Department of Computer Science at University of Saskatchewan, Saskatoon, Canada
| | - Jianxin Wang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering at Central South University, Changsha, Hunan, China
| |
Collapse
|
5
|
Hale OJ, Cooper HJ. In situ mass spectrometry analysis of intact proteins and protein complexes from biological substrates. Biochem Soc Trans 2020; 48:317-326. [PMID: 32010951 PMCID: PMC7054757 DOI: 10.1042/bst20190793] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Revised: 01/09/2020] [Accepted: 01/09/2020] [Indexed: 12/15/2022]
Abstract
Advances in sample preparation, ion sources and mass spectrometer technology have enabled the detection and characterisation of intact proteins. The challenges associated include an appropriately soft ionisation event, efficient transmission and detection of the often delicate macromolecules. Ambient ion sources, in particular, offer a wealth of strategies for analysis of proteins from solution environments, and directly from biological substrates. The last two decades have seen rapid development in this area. Innovations include liquid extraction surface analysis, desorption electrospray ionisation and nanospray desorption electrospray ionisation. Similarly, developments in native mass spectrometry allow protein-protein and protein-ligand complexes to be ionised and analysed. Identification and characterisation of these large ions involves a suite of hyphenated mass spectrometry techniques, often including the coupling of ion mobility spectrometry and fragmentation techniques. The latter include collision, electron and photon-induced methods, each with their own characteristics and benefits for intact protein identification. In this review, recent developments for in situ protein analysis are explored, with a focus on ion sources and tandem mass spectrometry techniques used for identification.
Collapse
Affiliation(s)
- Oliver J. Hale
- School of Biosciences, University of Birmingham, Edgbaston B15 2TT, U.K
| | - Helen J. Cooper
- School of Biosciences, University of Birmingham, Edgbaston B15 2TT, U.K
| |
Collapse
|
6
|
Noor Z, Ranganathan S. Bioinformatics approaches for improving seminal plasma proteome analysis. Theriogenology 2019; 137:43-49. [PMID: 31186128 DOI: 10.1016/j.theriogenology.2019.05.036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
Reproduction efficiency of male animals is one of the key factors influencing the sustainability of livestock. Mass spectrometry (MS) based proteomics has become an important tool for studying seminal plasma proteomes. In this review, we summarize bioinformatics analysis strategies for current proteomics approaches, for identifying novel biomarkers of reproductive robustness.
Collapse
Affiliation(s)
- Zainab Noor
- Department of Molecular Sciences, Macquarie University, Sydney, Australia
| | - Shoba Ranganathan
- Department of Molecular Sciences, Macquarie University, Sydney, Australia.
| |
Collapse
|
7
|
Li Z, He B, Kou Q, Wang Z, Wu S, Liu Y, Feng W, Liu X. Evaluation of top-down mass spectral identification with homologous protein sequences. BMC Bioinformatics 2018; 19:494. [PMID: 30591035 PMCID: PMC6309053 DOI: 10.1186/s12859-018-2462-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Top-down mass spectrometry has unique advantages in identifying proteoforms with multiple post-translational modifications and/or unknown alterations. Most software tools in this area search top-down mass spectra against a protein sequence database for proteoform identification. When the species studied in a mass spectrometry experiment lacks its proteome sequence database, a homologous protein sequence database can be used for proteoform identification. The accuracy of homologous protein sequences affects the sensitivity of proteoform identification and the accuracy of mass shift localization. RESULTS We tested TopPIC, a commonly used software tool for top-down mass spectral identification, on a top-down mass spectrometry data set of Escherichia coli K12 MG1655, and evaluated its performance using an Escherichia coli K12 MG1655 proteome database and a homologous protein database. The number of identified spectra with the homologous database was about half of that with the Escherichia coli K12 MG1655 database. We also tested TopPIC on a top-down mass spectrometry data set of human MCF-7 cells and obtained similar results. CONCLUSIONS Experimental results demonstrated that TopPIC is capable of identifying many proteoform spectrum matches and localizing unknown alterations using homologous protein sequences containing no more than 2 mutations.
Collapse
Affiliation(s)
- Ziwei Li
- College of Automation, Harbin Engineering University, 145, Nan Tong Street, Harbin, Heilongjiang, 150001 China
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, 410 West 10th Street, Indianapolis, IN, 46202 USA
| | - Bo He
- College of Automation, Harbin Engineering University, 145, Nan Tong Street, Harbin, Heilongjiang, 150001 China
| | - Qiang Kou
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, 719 Indiana Avenue, Indianapolis, IN, 46202 USA
| | - Zhe Wang
- Department of Chemistry and Biochemistry, University of Oklahoma, 101 Stephenson Parkway, Norman, OK, 73019 USA
| | - Si Wu
- Department of Chemistry and Biochemistry, University of Oklahoma, 101 Stephenson Parkway, Norman, OK, 73019 USA
| | - Yunlong Liu
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, 410 West 10th Street, Indianapolis, IN, 46202 USA
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, 410 West 10th Street, Indianapolis, IN, 46202 USA
| | - Weixing Feng
- College of Automation, Harbin Engineering University, 145, Nan Tong Street, Harbin, Heilongjiang, 150001 China
| | - Xiaowen Liu
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, 719 Indiana Avenue, Indianapolis, IN, 46202 USA
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, 410 West 10th Street, Indianapolis, IN, 46202 USA
| |
Collapse
|
8
|
Low TY, Mohtar MA, Ang MY, Jamal R. Connecting Proteomics to Next‐Generation Sequencing: Proteogenomics and Its Current Applications in Biology. Proteomics 2018; 19:e1800235. [DOI: 10.1002/pmic.201800235] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2018] [Revised: 10/09/2018] [Indexed: 12/17/2022]
Affiliation(s)
- Teck Yew Low
- UKM Medical Molecular Biology Institute (UMBI)Universiti Kebangsaan Malaysia 56000 Kuala Lumpur Malaysia
| | - M. Aiman Mohtar
- UKM Medical Molecular Biology Institute (UMBI)Universiti Kebangsaan Malaysia 56000 Kuala Lumpur Malaysia
| | - Mia Yang Ang
- UKM Medical Molecular Biology Institute (UMBI)Universiti Kebangsaan Malaysia 56000 Kuala Lumpur Malaysia
| | - Rahman Jamal
- UKM Medical Molecular Biology Institute (UMBI)Universiti Kebangsaan Malaysia 56000 Kuala Lumpur Malaysia
| |
Collapse
|
9
|
Sandrin TR, Demirev PA. Characterization of microbial mixtures by mass spectrometry. MASS SPECTROMETRY REVIEWS 2018; 37:321-349. [PMID: 28509357 DOI: 10.1002/mas.21534] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/23/2016] [Revised: 03/09/2017] [Accepted: 03/09/2017] [Indexed: 05/27/2023]
Abstract
MS applications in microbiology have increased significantly in the past 10 years, due in part to the proliferation of regulator-approved commercial MALDI MS platforms for rapid identification of clinical infections. In parallel, with the expansion of MS technologies in the "omics" fields, novel MS-based research efforts to characterize organismal as well as environmental microbiomes have emerged. Successful characterization of microorganisms found in complex mixtures of other organisms remains a major challenge for researchers and clinicians alike. Here, we review recent MS advances toward addressing that challenge. These include sample preparation methods and protocols, and established, for example, MALDI, as well as newer, for example, atmospheric pressure ionization (API) techniques. MALDI mass spectra of intact cells contain predominantly information on the highly expressed house-keeping proteins used as biomarkers. The API methods are applicable for small biomolecule analysis, for example, phospholipids and lipopeptides, and facilitate species differentiation. MS hardware and techniques, for example, tandem MS, including diverse ion source/mass analyzer combinations are discussed. Relevant examples for microbial mixture characterization utilizing these combinations are provided. Chemometrics and bioinformatics methods and algorithms, including those applied to large scale MS data acquisition in microbial metaproteomics and MS imaging of biofilms, are highlighted. Select MS applications for polymicrobial culture analysis in environmental and clinical microbiology are reviewed as well.
Collapse
Affiliation(s)
- Todd R Sandrin
- School of Mathematical and Natural Sciences, Arizona State University, Phoenix, Arizona
| | - Plamen A Demirev
- Applied Physics Laboratory, Johns Hopkins University, Laurel, Maryland
| |
Collapse
|
10
|
Kolmogorov M, Kennedy E, Dong Z, Timp G, Pevzner PA. Single-molecule protein identification by sub-nanopore sensors. PLoS Comput Biol 2017; 13:e1005356. [PMID: 28486472 PMCID: PMC5423552 DOI: 10.1371/journal.pcbi.1005356] [Citation(s) in RCA: 39] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2016] [Accepted: 01/09/2017] [Indexed: 11/19/2022] Open
Abstract
Recent advances in top-down mass spectrometry enabled identification of intact proteins, but this technology still faces challenges. For example, top-down mass spectrometry suffers from a lack of sensitivity since the ion counts for a single fragmentation event are often low. In contrast, nanopore technology is exquisitely sensitive to single intact molecules, but it has only been successfully applied to DNA sequencing, so far. Here, we explore the potential of sub-nanopores for single-molecule protein identification (SMPI) and describe an algorithm for identification of the electrical current blockade signal (nanospectrum) resulting from the translocation of a denaturated, linearly charged protein through a sub-nanopore. The analysis of identification p-values suggests that the current technology is already sufficient for matching nanospectra against small protein databases, e.g., protein identification in bacterial proteomes. Protein identification is the key step in many proteomics studies. Currently, the most popular technique for intact protein analysis is top-down mass spectrometry which recently enabled high-throughput identification of many proteins and their proteoforms. However, this approach requires large amounts of materials and is currently limited to short proteins, typically less than 30 kDa. On the other hand, nanopore sensors promise single molecule sensitivity in protein analysis, but an approach for the identification of a single protein from its blockade current (nanospectrum) has remained elusive, since the signal from the sensors relates to the amino acid sequence of the protein in a poorly understood way. In this work we describe the first algorithm for protein identification based on nanospectra associated with translocation of proteins through pores with sub-nanometer diameters. While identification accuracy currently does not allow reliable processing of complex protein samples yet, we believe, that the rapidly improving experimental protocols along with the new computational algorithms will transform into a viable protein identification approach in the near future.
Collapse
Affiliation(s)
- Mikhail Kolmogorov
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, California, United States of America
| | - Eamonn Kennedy
- Electrical Engineering and Biological Science, University of Notre Dame, Notre Dame, Indiana, United States of America
| | - Zhuxin Dong
- Electrical Engineering and Biological Science, University of Notre Dame, Notre Dame, Indiana, United States of America
| | - Gregory Timp
- Electrical Engineering and Biological Science, University of Notre Dame, Notre Dame, Indiana, United States of America
| | - Pavel A. Pevzner
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, California, United States of America
- * E-mail:
| |
Collapse
|
11
|
|