1
|
Das D, Podder S. Microscale marvels: unveiling the macroscopic significance of micropeptides in human health. Brief Funct Genomics 2024; 23:624-638. [PMID: 38706311 DOI: 10.1093/bfgp/elae018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Revised: 04/07/2024] [Accepted: 04/15/2024] [Indexed: 05/07/2024] Open
Abstract
Non-coding RNA encodes micropeptides from small open reading frames located within the RNA. Interestingly, these micropeptides are involved in a variety of functions within the body. They are emerging as the resolving piece of the puzzle for complex biomolecular signaling pathways within the body. Recent studies highlight the pivotal role of small peptides in regulating important biological processes like DNA repair, gene expression, muscle regeneration, immune responses, etc. On the contrary, altered expression of micropeptides also plays a pivotal role in the progression of various diseases like cardiovascular diseases, neurological disorders and several types of cancer, including colorectal cancer, hepatocellular cancer, lung cancer, etc. This review delves into the dual impact of micropeptides on health and pathology, exploring their pivotal role in preserving normal physiological homeostasis and probing their involvement in the triggering and progression of diseases.
Collapse
Affiliation(s)
- Deepyaman Das
- Computational and Systems Biology Laboratory, Department of Microbiology, Raiganj University, Raiganj, Uttar Dinajpur, West Bengal-733134, India
| | - Soumita Podder
- Computational and Systems Biology Laboratory, Department of Microbiology, Raiganj University, Raiganj, Uttar Dinajpur, West Bengal-733134, India
| |
Collapse
|
2
|
Zulfiqar M, Singh V, Steinbeck C, Sorokina M. Review on computer-assisted biosynthetic capacities elucidation to assess metabolic interactions and communication within microbial communities. Crit Rev Microbiol 2024:1-40. [PMID: 38270170 DOI: 10.1080/1040841x.2024.2306465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Accepted: 01/12/2024] [Indexed: 01/26/2024]
Abstract
Microbial communities thrive through interactions and communication, which are challenging to study as most microorganisms are not cultivable. To address this challenge, researchers focus on the extracellular space where communication events occur. Exometabolomics and interactome analysis provide insights into the molecules involved in communication and the dynamics of their interactions. Advances in sequencing technologies and computational methods enable the reconstruction of taxonomic and functional profiles of microbial communities using high-throughput multi-omics data. Network-based approaches, including community flux balance analysis, aim to model molecular interactions within and between communities. Despite these advances, challenges remain in computer-assisted biosynthetic capacities elucidation, requiring continued innovation and collaboration among diverse scientists. This review provides insights into the current state and future directions of computer-assisted biosynthetic capacities elucidation in studying microbial communities.
Collapse
Affiliation(s)
- Mahnoor Zulfiqar
- Institute for Inorganic and Analytical Chemistry, Friedrich Schiller University, Jena, Germany
- Cluster of Excellence Balance of the Microverse, Friedrich Schiller University Jena, Jena, Germany
| | - Vinay Singh
- Institute for Inorganic and Analytical Chemistry, Friedrich Schiller University, Jena, Germany
| | - Christoph Steinbeck
- Institute for Inorganic and Analytical Chemistry, Friedrich Schiller University, Jena, Germany
- Cluster of Excellence Balance of the Microverse, Friedrich Schiller University Jena, Jena, Germany
| | - Maria Sorokina
- Institute for Inorganic and Analytical Chemistry, Friedrich Schiller University, Jena, Germany
- Data Science and Artificial Intelligence, Research and Development, Pharmaceuticals, Bayer, Berlin, Germany
| |
Collapse
|
3
|
Boekweg H, Payne SH. Challenges and Opportunities for Single-cell Computational Proteomics. Mol Cell Proteomics 2023; 22:100518. [PMID: 36828128 PMCID: PMC10060113 DOI: 10.1016/j.mcpro.2023.100518] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 02/15/2023] [Accepted: 02/17/2023] [Indexed: 02/25/2023] Open
Abstract
Single-cell proteomics is growing rapidly and has made several technological advancements. As most research has been focused on improving instrumentation and sample preparation methods, very little attention has been given to algorithms responsible for identifying and quantifying proteins. Given the inherent difference between bulk data and single-cell data, it is necessary to realize that current algorithms being employed on single-cell data were designed for bulk data and have underlying assumptions that may not hold true for single-cell data. In order to develop and optimize algorithms for single-cell data, we need to characterize the differences between single-cell data and bulk data and assess how current algorithms perform on single-cell data. Here, we present a review of algorithms responsible for identifying and quantifying peptides and proteins. We will give a review of how each type of algorithm works, assumptions it relies on, how it performs on single-cell data, and possible optimizations and solutions that could be used to address the differences in single-cell data.
Collapse
Affiliation(s)
- Hannah Boekweg
- Biology Department, Brigham Young University, Provo, Utah, USA
| | - Samuel H Payne
- Biology Department, Brigham Young University, Provo, Utah, USA.
| |
Collapse
|
4
|
Liu T, Zou B, He M, Hu Y, Dou Y, Cui T, Tan P, Li S, Rao S, Huang Y, Liu S, Cai K, Wang D. LncReader: identification of dual functional long noncoding RNAs using a multi-head self-attention mechanism. Brief Bioinform 2023; 24:6961607. [PMID: 36575567 DOI: 10.1093/bib/bbac579] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 11/11/2022] [Accepted: 11/28/2022] [Indexed: 12/29/2022] Open
Abstract
Long noncoding ribonucleic acids (RNAs; LncRNAs) endowed with both protein-coding and noncoding functions are referred to as 'dual functional lncRNAs'. Recently, dual functional lncRNAs have been intensively studied and identified as involved in various fundamental cellular processes. However, apart from time-consuming and cell-type-specific experiments, there is virtually no in silico method for predicting the identity of dual functional lncRNAs. Here, we developed a deep-learning model with a multi-head self-attention mechanism, LncReader, to identify dual functional lncRNAs. Our data demonstrated that LncReader showed multiple advantages compared to various classical machine learning methods using benchmark datasets from our previously reported cncRNAdb project. Moreover, to obtain independent in-house datasets for robust testing, mass spectrometry proteomics combined with RNA-seq and Ribo-seq were applied in four leukaemia cell lines, which further confirmed that LncReader achieved the best performance compared to other tools. Therefore, LncReader provides an accurate and practical tool that enables fast dual functional lncRNA identification.
Collapse
Affiliation(s)
- Tianyuan Liu
- Department of Hematology and Oncology, Shenzhen Children's Hospital, Shenzhen 518038, China.,Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Bohao Zou
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China.,Department of Statistics, University of California Davis, Davis, California, USA
| | - Manman He
- State Key Laboratory of Medical Molecular Biology, Key Laboratorytar of RNA Regulation and Hematopoiesis, Department of Biochemistry and Molecular Biology, Institute of Basic Medical Sciences, School of Basic Medicine, CAMS and Peking Union Medical College, Beijing 100005, China
| | - Yongfei Hu
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China.,Dermatology Hospital, Southern Medical University, Guangzhou, 510091, China
| | - Yiying Dou
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Tianyu Cui
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Puwen Tan
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Shaobin Li
- Department of Thoracic Surgery, Nanfang Hospital, Southern Medical University, Guangzhou 510515, China
| | - Shuan Rao
- Department of Thoracic Surgery, Nanfang Hospital, Southern Medical University, Guangzhou 510515, China
| | - Yan Huang
- Cancer Research Institute, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Sixi Liu
- Department of Hematology and Oncology, Shenzhen Children's Hospital, Shenzhen 518038, China
| | - Kaican Cai
- Department of Thoracic Surgery, Nanfang Hospital, Southern Medical University, Guangzhou 510515, China
| | - Dong Wang
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China.,Dermatology Hospital, Southern Medical University, Guangzhou, 510091, China.,Department of Bioinformatics, Fujian Key Laboratory of Medical Bioinformatics, School of Medical Technology and Engineering, Fujian Medical University, Fuzhou, 350122, China
| |
Collapse
|
5
|
Micropeptides translated from putative long non-coding RNAs. Acta Biochim Biophys Sin (Shanghai) 2022; 54:292-300. [PMID: 35538037 PMCID: PMC9827906 DOI: 10.3724/abbs.2022010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Long non-coding RNAs (lncRNAs) transcribed in mammals and eukaryotes were thought to have no protein coding capability. However, recent studies have suggested that plenty of lncRNAs are mis-annotated and virtually contain coding sequences which are translated into functional peptides by ribosomal machinery, and these functional peptides are called micropeptides or small peptides. Here we review the rapidly advancing field of micropeptides translated from putative lncRNAs, describe the strategies for their identification, and elucidate their critical roles in many fundamental biological processes. We also discuss the prospects of research in micropeptides and the potential applications of micropeptides.
Collapse
|
6
|
Fotakis G, Trajanoski Z, Rieder D. Computational cancer neoantigen prediction: current status and recent advances. IMMUNO-ONCOLOGY TECHNOLOGY 2021; 12:100052. [PMID: 35755950 PMCID: PMC9216660 DOI: 10.1016/j.iotech.2021.100052] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Over the last few decades, immunotherapy has shown significant therapeutic efficacy in a broad range of cancer types. Antitumor immune responses are contingent on the recognition of tumor-specific antigens, which are termed neoantigens. Tumor neoantigens are ideal targets for immunotherapy since they can be recognized as non-self antigens by the host immune system and thus are able to elicit an antitumor T-cell response. There are an increasing number of studies that highlight the importance of tumor neoantigens in immunoediting and in the sensitivity to immune checkpoint blockade. Therefore, one of the most fundamental tasks in the field of immuno-oncology research is the identification of patient-specific neoantigens. To this end, a plethora of computational approaches have been developed in order to predict tumor-specific aberrant peptides and quantify their likelihood of binding to patients' human leukocyte antigen molecules in order to be recognized by T cells. In this review, we systematically summarize and present the most recent advances in computational neoantigen prediction, and discuss the challenges and novel methods that are being developed to resolve them. Tumors have the ability to acquire immune escape mechanisms. Tumor-specific aberrant peptides (neoantigens) can elicit an immune response by the host immune system. The identification of neoantigens is one of the most fundamental tasks in the field of immuno-oncology research. A plethora of computational approaches have been developed in order to predict patient-specificneoantigens.
Collapse
Affiliation(s)
- G Fotakis
- Institute of Bioinformatics, Biocenter, Medical University of Innsbruck, Innsbruck, Austria
| | - Z Trajanoski
- Institute of Bioinformatics, Biocenter, Medical University of Innsbruck, Innsbruck, Austria
| | - D Rieder
- Institute of Bioinformatics, Biocenter, Medical University of Innsbruck, Innsbruck, Austria
| |
Collapse
|
7
|
Abstract
Recent advancements in genetic and proteomic technologies have revealed that more of the genome encodes proteins than originally thought possible. Specifically, some putative long noncoding RNAs (lncRNAs) have been misannotated as noncoding. Numerous lncRNAs have been found to contain short open reading frames (sORFs) which have been overlooked because of their small size. Many of these sORFs encode small proteins or micropeptides with fundamental biological importance. These micropeptides can aid in diverse processes, including cell division, transcription regulation, and cell signaling. Here we discuss strategies for establishing the coding potential of putative lncRNAs and describe various functions of known micropeptides.
Collapse
|
8
|
Elpa DP, Prabhu GRD, Wu SP, Tay KS, Urban PL. Automation of mass spectrometric detection of analytes and related workflows: A review. Talanta 2019; 208:120304. [PMID: 31816721 DOI: 10.1016/j.talanta.2019.120304] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2019] [Revised: 08/26/2019] [Accepted: 08/28/2019] [Indexed: 12/13/2022]
Abstract
The developments in mass spectrometry (MS) in the past few decades reveal the power and versatility of this technology. MS methods are utilized in routine analyses as well as research activities involving a broad range of analytes (elements and molecules) and countless matrices. However, manual MS analysis is gradually becoming a thing of the past. In this article, the available MS automation strategies are critically evaluated. Automation of analytical workflows culminating with MS detection encompasses involvement of automated operations in any of the steps related to sample handling/treatment before MS detection, sample introduction, MS data acquisition, and MS data processing. Automated MS workflows help to overcome the intrinsic limitations of MS methodology regarding reproducibility, throughput, and the expertise required to operate MS instruments. Such workflows often comprise automated off-line and on-line steps such as sampling, extraction, derivatization, and separation. The most common instrumental tools include autosamplers, multi-axis robots, flow injection systems, and lab-on-a-chip. Prototyping customized automated MS systems is a way to introduce non-standard automated features to MS workflows. The review highlights the enabling role of automated MS procedures in various sectors of academic research and industry. Examples include applications of automated MS workflows in bioscience, environmental studies, and exploration of the outer space.
Collapse
Affiliation(s)
- Decibel P Elpa
- Department of Applied Chemistry, National Chiao Tung University, 1001 University Rd., Hsinchu, 300, Taiwan; Department of Chemistry, National Tsing Hua University, 101, Section 2, Kuang-Fu Rd., Hsinchu, 30013, Taiwan
| | - Gurpur Rakesh D Prabhu
- Department of Applied Chemistry, National Chiao Tung University, 1001 University Rd., Hsinchu, 300, Taiwan; Department of Chemistry, National Tsing Hua University, 101, Section 2, Kuang-Fu Rd., Hsinchu, 30013, Taiwan
| | - Shu-Pao Wu
- Department of Applied Chemistry, National Chiao Tung University, 1001 University Rd., Hsinchu, 300, Taiwan.
| | - Kheng Soo Tay
- Department of Chemistry, Faculty of Science, University of Malaya, 50603 Kuala Lumpur, Malaysia
| | - Pawel L Urban
- Department of Chemistry, National Tsing Hua University, 101, Section 2, Kuang-Fu Rd., Hsinchu, 30013, Taiwan; Frontier Research Center on Fundamental and Applied Sciences of Matters, National Tsing Hua University, 101, Section 2, Kuang-Fu Rd., Hsinchu, 30013, Taiwan.
| |
Collapse
|
9
|
Azari S, Xue B, Zhang M, Peng L. Preprocessing Tandem Mass Spectra Using Genetic Programming for Peptide Identification. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2019; 30:1294-1307. [PMID: 31025295 DOI: 10.1007/s13361-019-02196-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/25/2018] [Revised: 01/15/2019] [Accepted: 03/11/2019] [Indexed: 06/09/2023]
Abstract
One of the major challenges in proteomics is peptide identification from mass spectra containing high noise ratio and small number of signal (b-/y-ions) peaks. However, the accuracy and reliability of peptide identification in such highly imbalanced MS/MS data can be improved by applying a preprocessing step prior to peptide identification aiming at discriminating b-/y-ions from noise peaks in the spectra. In this study, we report a genetic programming (GP)-based preprocessing method for de-noising highly imbalanced and noisy CID MS/MS spectra. GP now becomes a popular machine learning method via automatic programming. GP preprocesses the highly noisy MS/MS spectra by classifying peaks as noise peaks or signal peaks in a binary classification manner. Meanwhile, a set of spectral fragment features based on the MS/MS fragmentation rules is extracted from the dataset to investigate their discriminating abilities by GP. A MS/MS spectral dataset containing thousands of spectra are used to train the GP model. As the GP tree-based representation has the capability for implicit feature selection during the evolutionary process, the evolved GP model with the selected features is compared with the best threshold-based method. The results show that the GP method improved the reliability of peptide identification and increased the identification rate of a de novo sequencing tool, PEAKS, to 99.4% from 80.1% achieved by the best threshold-based method. Moreover, the result of peptide identification by a database search tool, SEQUEST, using the data preprocessed by the GP method was statistically significant compared to the other methods.
Collapse
Affiliation(s)
- Samaneh Azari
- School of Engineering and Computer Science, Victoria University of Wellington, Wellington, Kelburn, 6012, New Zealand.
- School of Engineering and Computer Science, Victoria University of Wellington, PO Box 600, Wellington, 6140, New Zealand.
| | - Bing Xue
- School of Engineering and Computer Science, Victoria University of Wellington, Wellington, Kelburn, 6012, New Zealand
| | - Mengjie Zhang
- School of Engineering and Computer Science, Victoria University of Wellington, Wellington, Kelburn, 6012, New Zealand
| | - Lifeng Peng
- Centre for Biodiscovery and School of Biological Sciences, Victoria University of Wellington, Wellington, New Zealand
| |
Collapse
|
10
|
Sun W, Liu Y, Lajoie GA, Ma B, Zhang K. An Improved Approach for N-Linked Glycan Structure Identification from HCD MS/MS Spectra. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:388-395. [PMID: 28489544 DOI: 10.1109/tcbb.2017.2701819] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Glycosylation is a frequently observed post-translational modification on proteins. Currently, tandem mass spectrometry (MS/MS) serves as an efficient analytical technique for characterizing structures of oligosaccharides. However, developing effective computational approaches for identifying glycan structures from mass spectra is still a great challenge in glycoproteomics research. In this study, we proposed an approach for matching the input spectra with glycan structures acquired from a glycan structure database by incorporating a de novo sequencing assisted ranking scheme. The proposed approach is implemented as a software tool, GlycoNovoDB, for automated glycan structure identification from HCD MS/MS of glycopeptides. Experimental results showed that GlycoNovoDB can identify glycans effectively and has better performance than our previously proposed de novo sequencing algorithm as well as another software GlycoMaster DB.
Collapse
|
11
|
Pavlopoulos GA, Kontou PI, Pavlopoulou A, Bouyioukos C, Markou E, Bagos PG. Bipartite graphs in systems biology and medicine: a survey of methods and applications. Gigascience 2018; 7:1-31. [PMID: 29648623 PMCID: PMC6333914 DOI: 10.1093/gigascience/giy014] [Citation(s) in RCA: 78] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2017] [Revised: 01/15/2018] [Accepted: 02/13/2018] [Indexed: 11/14/2022] Open
Abstract
The latest advances in high-throughput techniques during the past decade allowed the systems biology field to expand significantly. Today, the focus of biologists has shifted from the study of individual biological components to the study of complex biological systems and their dynamics at a larger scale. Through the discovery of novel bioentity relationships, researchers reveal new information about biological functions and processes. Graphs are widely used to represent bioentities such as proteins, genes, small molecules, ligands, and others such as nodes and their connections as edges within a network. In this review, special focus is given to the usability of bipartite graphs and their impact on the field of network biology and medicine. Furthermore, their topological properties and how these can be applied to certain biological case studies are discussed. Finally, available methodologies and software are presented, and useful insights on how bipartite graphs can shape the path toward the solution of challenging biological problems are provided.
Collapse
Affiliation(s)
- Georgios A Pavlopoulos
- Lawrence Berkeley Labs, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA
| | - Panagiota I Kontou
- University of Thessaly, Department of Computer Science and Biomedical Informatics, Papasiopoulou 2–4, Lamia, 35100, Greece
| | - Athanasia Pavlopoulou
- Izmir International Biomedicine and Genome Institute (iBG-Izmir), Dokuz Eylül University, 35340, Turkey
| | - Costas Bouyioukos
- Université Paris Diderot, Sorbonne Paris Cité, Epigenetics and Cell Fate, UMR7216, CNRS, France
| | - Evripides Markou
- University of Thessaly, Department of Computer Science and Biomedical Informatics, Papasiopoulou 2–4, Lamia, 35100, Greece
| | - Pantelis G Bagos
- University of Thessaly, Department of Computer Science and Biomedical Informatics, Papasiopoulou 2–4, Lamia, 35100, Greece
| |
Collapse
|
12
|
AlJadda K, Korayem M, Ortiz C, Grainger T, Miller JA, Rasheed KM, Kochut KJ, Peng H, York WS, Ranzinger R, Porterfield M. Mining massive hierarchical data using a scalable probabilistic graphical model. Inf Sci (N Y) 2018. [DOI: 10.1016/j.ins.2017.10.014] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
13
|
Maabreh M, Qolomany B, Springstead J, Alsmadi I, Gupta A. Deep vs. Shallow Learning-based Filters of MSMS Spectra in Support of Protein Search Engines. PROCEEDINGS. IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE 2017; 2017:1175-1182. [PMID: 34408917 PMCID: PMC8370709 DOI: 10.1109/bibm.2017.8217824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Despite the linear relation between the number of observed spectra and the searching time, the current protein search engines, even the parallel versions, could take several hours to search a large amount of MSMS spectra, which can be generated in a short time. After a laborious searching process, some (and at times, majority) of the observed spectra are labeled as non-identifiable. We evaluate the role of machine learning in building an efficient MSMS filter to remove non-identifiable spectra. We compare and evaluate the deep learning algorithm using 9 shallow learning algorithms with different configurations. Using 10 different datasets generated from two different search engines, different instruments, different sizes and from different species, we experimentally show that deep learning models are powerful in filtering MSMS spectra. We also show that our simple features list is significant where other shallow learning algorithms showed encouraging results in filtering the MSMS spectra. Our deep learning model can exclude around 50% of the non-identifiable spectra while losing, on average, only 9% of the identifiable ones. As for shallow learning, algorithms of: Random Forest, Support Vector Machine and Neural Networks showed encouraging results, eliminating, on average, 70% of the non-identifiable spectra while losing around 25% of the identifiable ones. The deep learning algorithm may be especially more useful in instances where the protein(s) of interest are in lower cellular or tissue concentration, while the other algorithms may be more useful for concentrated or more highly expressed proteins.
Collapse
Affiliation(s)
- Majdi Maabreh
- Department of Computer Science, Western Michigan University, Kalamazoo, MI, USA
| | - Basheer Qolomany
- Department of Computer Science, Western Michigan University, Kalamazoo, MI, USA
| | - James Springstead
- Department of Chemical and Paper Engineering, Western Michigan University, Kalamazoo, MI, USA
| | - Izzat Alsmadi
- Department of Computing and Cyber Security, Texas A&M University, San Antonio, TX, USA
| | - Ajay Gupta
- Department of Computer Science, Western Michigan University, Kalamazoo, MI, USA
| |
Collapse
|
14
|
Abstract
Mass spectrometry (MS) is an analytical technique for determining the composition of a sample. In bottom-up techniques, peptide mass fingerprinting (PMF) is widely used to identify proteins from MS dataset. In this article, the authors developed a novel network-based inference software termed NBPMF. By analyzing peptide-protein bipartite network, they designed new peptide protein matching score functions. They present two methods: the static one, ProbS, is based on an independent probability framework; and the dynamic one, HeatS, depicts input data as dependent peptides. The authors also use linear regression to adjust the matching score according to the masses of proteins. In addition, they consider the order of retention time to further correct the score function. In post processing, a peak can only be assigned to one peptide in order to reduce random matches. Finally, the authors try to filter out false positive proteins. The experiments on simulated and real data demonstrate that their NBPMF approaches lead to significantly improved performance compared to several state-of-the-art methods.
Collapse
Affiliation(s)
- Zhewei Liang
- Department of Computer Science, University of Western Ontario, London, Canada
| | - Gilles Lajoie
- Department of Biochemistry, University of Western Ontario, London, Canada
| | - Kaizhong Zhang
- Department of Computer Science, University of Western Ontario, London, Canada
| |
Collapse
|
15
|
Liu Y, Sun W, John J, Lajoie G, Ma B, Zhang K. De Novo Sequencing Assisted Approach for Characterizing Mixture MS/MS Spectra. IEEE Trans Nanobioscience 2016; 15:166-76. [PMID: 26800542 DOI: 10.1109/tnb.2016.2519841] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Extensive research has been conducted for the computational analysis of mass spectrometry based proteomics data. However, there are still remaining challenges, among which, one particular challenge is the low identification rate of the collected spectral data. A specific contributing factor is the existence of mixture spectra in the collected MS/MS spectra which are generated by the concurrent fragmentation of multiple precursors in one sequencing attempt. The quite frequently observed mixture spectra necessitates the development of effective computational approaches to characterize those non-conventional spectral data. In this research, we proposed an approach for matching the query mixture spectra with a pair of peptide sequences acquired from the protein database by incorporating a special de novo assisted filtration strategy. The experiment results on two different datasets of MS/MS spectra containing mixed ion fragments from multiple peptides demonstrated the efficiency of the integrated filtration strategy in reducing examination space and verified the effectiveness of the proposed matching scheme as well.
Collapse
|
16
|
Sun W, Kuljanin M, Pittock P, Ma B, Zhang K, Lajoie GA. An Effective Approach for Glycan Structure De Novo Sequencing From HCD Spectra. IEEE Trans Nanobioscience 2016; 15:177-84. [PMID: 26800543 DOI: 10.1109/tnb.2016.2519861] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Mass spectrometry has become a widely used analytical technique for proteomics study because of its high throughput and sensitivity. Among those applications, a specific one is to characterize glycan structure. Glycosylation is a frequently occurred post-translational modification of proteins which is relevant to humans' health. Therefore, it is significant to develop effective computational methods to automate the identification of glycan structures from mass spectral data. In our research, we mathematically formulated the glycan de novo sequencing problem and proposed a heuristic algorithm for glycan de novo sequencing from HCD MS/MS spectra of N-linked glycopeptides. The algorithm proceeds in a carefully designate pathway to construct the best matched tree structure from MS/MS spectrum. Experimental results showed that our proposed approach can effectively identify glycan structures from HCD MS/MS spectra.
Collapse
|
17
|
Zhou T, Sha J, Guo X. The need to revisit published data: A concept and framework for complementary proteomics. Proteomics 2015; 16:6-11. [PMID: 26552962 DOI: 10.1002/pmic.201500170] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2015] [Revised: 08/26/2015] [Accepted: 11/04/2015] [Indexed: 12/14/2022]
Abstract
Tandem proteomic strategies based on large-scale and high-resolution mass spectrometry have been widely applied in various biomedical studies. However, protein sequence databases and proteomic software are continuously updated. Proteomic studies should not be ended with a stable list of proteins. It is necessary and beneficial to regularly revise the results. Besides, the original proteomic studies usually focused on a limited aspect of protein information and valuable information may remain undiscovered in the raw spectra. Several studies have reported novel findings by reanalyzing previously published raw data. However, there are still no standard guidelines for comprehensive reanalysis. In the present study, we proposed the concept and draft framework for complementary proteomics, which are aimed to revise protein list or mine new discoveries by revisiting published data.
Collapse
Affiliation(s)
- Tao Zhou
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, P. R. China
| | - Jiahao Sha
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, P. R. China
| | - Xuejiang Guo
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, P. R. China
| |
Collapse
|
18
|
Sage E, Brenac A, Alava T, Morel R, Dupré C, Hanay MS, Roukes ML, Duraffourg L, Masselon C, Hentz S. Neutral particle mass spectrometry with nanomechanical systems. Nat Commun 2015; 6:6482. [PMID: 25753929 PMCID: PMC4366497 DOI: 10.1038/ncomms7482] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2015] [Accepted: 02/02/2015] [Indexed: 12/21/2022] Open
Abstract
Current approaches to mass spectrometry (MS) require ionization of the analytes of interest. For high-mass species, the resulting charge state distribution can be complex and difficult to interpret correctly. Here, using a setup comprising both conventional time-of-flight MS (TOF-MS) and nano-electromechanical systems-based MS (NEMS-MS) in situ, we show directly that NEMS-MS analysis is insensitive to charge state: the spectrum consists of a single peak whatever the species' charge state, making it significantly clearer than existing MS analysis. In subsequent tests, all the charged particles are electrostatically removed from the beam, and unlike TOF-MS, NEMS-MS can still measure masses. This demonstrates the possibility to measure mass spectra for neutral particles. Thus, it is possible to envisage MS-based studies of analytes that are incompatible with current ionization techniques and the way is now open for the development of cutting-edge system architectures with unique analytical capability.
Collapse
Affiliation(s)
- Eric Sage
- Université Grenoble Alpes, F-38000 Grenoble, France
- CEA, LETI, Minatec Campus, F-38054 Grenoble, France
| | - Ariel Brenac
- Université Grenoble Alpes, INAC-SP2M, F-38000 Grenoble, France
- CEA, INAC- SP2M, F-38000 Grenoble, France
| | - Thomas Alava
- Université Grenoble Alpes, F-38000 Grenoble, France
- CEA, LETI, Minatec Campus, F-38054 Grenoble, France
| | - Robert Morel
- Université Grenoble Alpes, INAC-SP2M, F-38000 Grenoble, France
- CEA, INAC- SP2M, F-38000 Grenoble, France
| | - Cécilia Dupré
- Université Grenoble Alpes, F-38000 Grenoble, France
- CEA, LETI, Minatec Campus, F-38054 Grenoble, France
| | - Mehmet Selim Hanay
- Departments of Physics, Applied Physics, and Bioengineering, Kavli Nanoscience Institute, California Institute of Technology, MC 149-33, Pasadena, California 91125, USA
| | - Michael L. Roukes
- Departments of Physics, Applied Physics, and Bioengineering, Kavli Nanoscience Institute, California Institute of Technology, MC 149-33, Pasadena, California 91125, USA
| | - Laurent Duraffourg
- Université Grenoble Alpes, F-38000 Grenoble, France
- CEA, LETI, Minatec Campus, F-38054 Grenoble, France
| | - Christophe Masselon
- Université Grenoble Alpes, F-38000 Grenoble, France
- CEA, IRTSV, Biologie à Grande Echelle, F-38054 Grenoble, France
- INSERM, U1038, F-38054 Grenoble, France
| | - Sébastien Hentz
- Université Grenoble Alpes, F-38000 Grenoble, France
- CEA, LETI, Minatec Campus, F-38054 Grenoble, France
| |
Collapse
|
19
|
Sun W, Lajoie GA, Ma B, Zhang K. A Novel Algorithm for Glycan de novo Sequencing Using Tandem Mass Spectrometry. BIOINFORMATICS RESEARCH AND APPLICATIONS 2015. [DOI: 10.1007/978-3-319-19048-8_27] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
|
20
|
|
21
|
HE LIN, MA BIN. ADEPTS: ADVANCED PEPTIDE DE NOVO SEQUENCING WITH A PAIR OF TANDEM MASS SPECTRA. J Bioinform Comput Biol 2011; 8:981-94. [DOI: 10.1142/s0219720010005099] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2010] [Revised: 08/23/2010] [Accepted: 08/23/2010] [Indexed: 11/18/2022]
Abstract
De novo sequencing is an important task in proteomics to identify novel peptide sequences. Traditionally, only one MS/MS spectrum is used for the sequencing of a peptide; however, the use of multiple spectra of the same peptide with different types of fragmentation has the potential to significantly increase the accuracy and practicality of de novo sequencing. Research into the use of multiple spectra is in a nascent stage. We propose a general framework to combine the two different types of MS/MS data. Experiments demonstrate that our method significantly improves the de novo sequencing of existing software.
Collapse
Affiliation(s)
- LIN HE
- David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | - BIN MA
- David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| |
Collapse
|
22
|
Cottrell JS. Protein identification using MS/MS data. J Proteomics 2011; 74:1842-51. [PMID: 21635977 DOI: 10.1016/j.jprot.2011.05.014] [Citation(s) in RCA: 118] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2011] [Revised: 05/04/2011] [Accepted: 05/09/2011] [Indexed: 12/28/2022]
Abstract
The subject of this tutorial is protein identification and characterisation by database searching of MS/MS Data. Peptide Mass Fingerprinting is excluded because it is covered in a separate tutorial. Practical aspects of database searching are emphasised, such as choice of sequence database, effect of mass tolerance, and how to identify post-translational modifications. The relationship between sensitivity and specificity is discussed, as is the challenge of using peptide match information to infer which proteins were present in the sample. Since these tutorials are introductory in nature, most references are to reviews, rather than primary research papers. Some familiarity with mass spectrometry and protein chemistry is assumed. There is an accompanying slide presentation, including speaker notes, and a collection of web-based, practical exercises, designed to reinforce key points. This Tutorial is part of the International Proteomics Tutorial Programme (IPTP 6).
Collapse
|
23
|
Miskevich F, Davis A, Leeprapaiwong P, Giganti V, Kostić NM, Angel LA. Metal complexes as artificial proteases in proteomics: A palladium(II) complex cleaves various proteins in solutions containing detergents. J Inorg Biochem 2011; 105:675-83. [DOI: 10.1016/j.jinorgbio.2011.01.010] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2010] [Revised: 01/14/2011] [Accepted: 01/18/2011] [Indexed: 11/15/2022]
|