1
|
Ding Y, Tiwari P, Guo F, Zou Q. Shared subspace-based radial basis function neural network for identifying ncRNAs subcellular localization. Neural Netw 2022; 156:170-178. [DOI: 10.1016/j.neunet.2022.09.026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 07/25/2022] [Accepted: 09/26/2022] [Indexed: 11/11/2022]
|
2
|
Zhou H, Wang H, Tang J, Ding Y, Guo F. Identify ncRNA Subcellular Localization via Graph Regularized k-Local Hyperplane Distance Nearest Neighbor Model on Multi-Kernel Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3517-3529. [PMID: 34432632 DOI: 10.1109/tcbb.2021.3107621] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Non-coding RNAs (ncRNAs) are a type of RNAs which are not used to encode protein sequences. Emerging evidence shows that lots of ncRNAs may participate in many biological processes and must be widely involved in many types of cancers. Therefore, understanding their functionality is of great importance. Similar to proteins, various functions of ncRNAs relies on their subcellular localizations. Traditional high-throughput methods in wet-lab to identify subcellular localization is time-consuming and costly. In this paper, we propose a novel computational method based on multi-kernel learning to identify multi-label ncRNA subcellular localizations, via graph regularized k-local hyperplane distance nearest neighbor algorithm. First, we construct six types of sequence-based feature descriptors and select important feature vectors. Then, we build a multi-kernel learning model with Hilbert-Schmidt independence criterion (HSIC) to obtain optimal weights for vairous features. Furthermore, we propose the graph regularized k-local hyperplane distance nearest neighbor algorithm (GHKNN) as a binary classification model for detecting one kind of non-coding RNA subcellular localization. Finally, we apply One-vs-Rest strategy to decompose multi-label problem of non-coding RNA subcellular localizations. Our method achieves excellent performance on three ncRNA datasets and three human ncRNA datasets, and out-performs other outstanding machine learning methods. Comparing to existing method, our model also performs well especially on small datasets. We expect that this model will be useful for the prediction of subcellular localization and the study of important functional mechanisms of ncRNAs. Furthermore, we establish user-friendly web server (http://ncrna.lbci.net/) with the implementation of our method, which can be easily used by most experimental scientists.
Collapse
|
3
|
Shi H, Zhang S, Li X. R5hmCFDV: computational identification of RNA 5-hydroxymethylcytosine based on deep feature fusion and deep voting. Brief Bioinform 2022; 23:6658858. [PMID: 35945157 DOI: 10.1093/bib/bbac341] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2022] [Revised: 07/17/2022] [Accepted: 07/25/2022] [Indexed: 11/13/2022] Open
Abstract
RNA 5-hydroxymethylcytosine (5hmC) is a kind of RNA modification, which is related to the life activities of many organisms. Studying its distribution is very important to reveal its biological function. Previously, high-throughput sequencing was used to identify 5hmC, but it is expensive and inefficient. Therefore, machine learning is used to identify 5hmC sites. Here, we design a model called R5hmCFDV, which is mainly divided into feature representation, feature fusion and classification. (i) Pseudo dinucleotide composition, dinucleotide binary profile and frequency, natural vector and physicochemical property are used to extract features from four aspects: nucleotide composition, coding, natural language and physical and chemical properties. (ii) To strengthen the relevance of features, we construct a novel feature fusion method. Firstly, the attention mechanism is employed to process four single features, stitch them together and feed them to the convolution layer. After that, the output data are processed by BiGRU and BiLSTM, respectively. Finally, the features of these two parts are fused by the multiply function. (iii) We design the deep voting algorithm for classification by imitating the soft voting mechanism in the Python package. The base classifiers contain deep neural network (DNN), convolutional neural network (CNN) and improved gated recurrent unit (GRU). And then using the principle of soft voting, the corresponding weights are assigned to the predicted probabilities of the three classifiers. The predicted probability values are multiplied by the corresponding weights and then summed to obtain the final prediction results. We use 10-fold cross-validation to evaluate the model, and the evaluation indicators are significantly improved. The prediction accuracy of the two datasets is as high as 95.41% and 93.50%, respectively. It demonstrates the stronger competitiveness and generalization performance of our model. In addition, all datasets and source codes can be found at https://github.com/HongyanShi026/R5hmCFDV.
Collapse
Affiliation(s)
- Hongyan Shi
- School of Mathematics and Statistics, Xidian University, Xi'an 710071, P. R. China
| | - Shengli Zhang
- School of Mathematics and Statistics, Xidian University, Xi'an 710071, P. R. China
| | - Xinjie Li
- School of Mathematics and Statistics, Xidian University, Xi'an 710071, P. R. China
| |
Collapse
|
4
|
Pathak M, Pokhriyal P, Gandhi I, Khambhampaty S. Implementation of chemometrics, design of experiments and neural network analysis for prior process knowledge assessment (PPKA), failure modes and effect analysis (FMEA), scale-down model development (SDM) and process characterization for a chromatographic purification of Teriparatide. Biotechnol Prog 2022; 38:e3252. [PMID: 35340128 DOI: 10.1002/btpr.3252] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 03/24/2022] [Accepted: 03/25/2022] [Indexed: 11/10/2022]
Abstract
Process understanding and characterization forms the foundation, ensuring consistent and robust biologics manufacturing process. Using appropriate modelling tools and machine learning approaches, the process data can be monitored in real time to avoid manufacturing risks. In this article, we have outlined an approach towards implementation of chemometrics and machine learning tools (neural network analysis) to model and predict the behaviour of a mixed-mode chromatography step for a biosimilar (Teriparatide) as a case study. The process development data and process knowledge was assimilated into a prior process knowledge assessment using chemometrics tools to derive important parameters critical to performance indicators (i.e. potential quality and process attributes) and to establish the severity ranking for the FMEA analysis. The characterization data of the chromatographic operation are presented alongwith the determination of the critical, key and non- key process parameters, set points, operating, process acceptance and characterized ranges. The scale-down model establishment was assessed using traditional approaches and novel approaches like batch evolution model and neural network analysis. The batch evolution model was further used to demonstrate batch monitoring through direct chromatographic data, thus demonstrating its application for continuos process verification. Assimilation of process knowledge through a structured data acquisition approach, built-in from process development to continuous process verification was demonstrated to result in a data analytics driven model that can be coupled with machine learning tools for real time process monitoring. We recommend application of these approaches with the FDA guidance on stage wise process development and validation to reduce manufacturing risks. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Mili Pathak
- R&D, Intas Pharmaceuticals Ltd. (Biopharma Division), Ahmedabad, Gujrat, India
| | - Prashant Pokhriyal
- R&D, Intas Pharmaceuticals Ltd. (Biopharma Division), Ahmedabad, Gujrat, India
| | - Irshad Gandhi
- R&D, Intas Pharmaceuticals Ltd. (Biopharma Division), Ahmedabad, Gujrat, India
| | | |
Collapse
|
5
|
Wang H, Ding Y, Tang J, Zou Q, Guo F. Identify RNA-associated subcellular localizations based on multi-label learning using Chou's 5-steps rule. BMC Genomics 2021; 22:56. [PMID: 33451286 PMCID: PMC7811227 DOI: 10.1186/s12864-020-07347-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2020] [Accepted: 12/22/2020] [Indexed: 12/04/2022] Open
Abstract
BACKGROUND Biological functions of biomolecules rely on the cellular compartments where they are located in cells. Importantly, RNAs are assigned in specific locations of a cell, enabling the cell to implement diverse biochemical processes in the way of concurrency. However, lots of existing RNA subcellular localization classifiers only solve the problem of single-label classification. It is of great practical significance to expand RNA subcellular localization into multi-label classification problem. RESULTS In this study, we extract multi-label classification datasets about RNA-associated subcellular localizations on various types of RNAs, and then construct subcellular localization datasets on four RNA categories. In order to study Homo sapiens, we further establish human RNA subcellular localization datasets. Furthermore, we utilize different nucleotide property composition models to extract effective features to adequately represent the important information of nucleotide sequences. In the most critical part, we achieve a major challenge that is to fuse the multivariate information through multiple kernel learning based on Hilbert-Schmidt independence criterion. The optimal combined kernel can be put into an integration support vector machine model for identifying multi-label RNA subcellular localizations. Our method obtained excellent results of 0.703, 0.757, 0.787, and 0.800, respectively on four RNA data sets on average precision. CONCLUSION To be specific, our novel method performs outstanding rather than other prediction tools on novel benchmark datasets. Moreover, we establish user-friendly web server with the implementation of our method.
Collapse
Affiliation(s)
- Hao Wang
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Yijie Ding
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, China
| | - Jijun Tang
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
- School of Computational Science and Engineering, University of South Carolina, Columbia, 29208, SC, US
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan, China
| | - Fei Guo
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China.
| |
Collapse
|
6
|
Valentine SJ, Ewing MA, Dilger JM, Glover MS, Geromanos S, Hughes C, Clemmer DE. Using ion mobility data to improve peptide identification: intrinsic amino acid size parameters. J Proteome Res 2011; 10:2318-29. [PMID: 21417239 PMCID: PMC3138335 DOI: 10.1021/pr1011312] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
A new method for enhancing peptide ion identification in proteomics analyses using ion mobility data is presented. Ideally, direct comparisons of experimental drift times (t(D)) with a standard mobility database could be used to rank candidate peptide sequence assignments. Such a database would represent only a fraction of sequences in protein databases and significant difficulties associated with the verification of data for constituent peptide ions would exist. A method that employs intrinsic amino acid size parameters to obtain ion mobility predictions that can be used to rank candidate peptide ion assignments is proposed. Intrinsic amino acid size parameters have been determined for doubly charged peptide ions from an annotated yeast proteome. Predictions of ion mobilities using the intrinsic size parameters are more accurate than those obtained from a polynomial fit to t(D) versus molecular weight data. More than a 2-fold improvement in prediction accuracy has been observed for a group of arginine-terminated peptide ions 12 residues in length. The use of this predictive enhancement as a means to aid peptide ion identification is discussed, and a simple peptide ion scoring scheme is presented.
Collapse
Affiliation(s)
- Stephen J Valentine
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States
| | | | | | | | | | | | | |
Collapse
|
7
|
Liu C, Wang H, Fu Y, Yuan Z, Chi H, Wang L, Sun R, He S. [Prediction of peptide retention time in reversed-phase liquid chromatography and its application in protein identification]. Se Pu 2010; 28:529-34. [PMID: 20873570 DOI: 10.3724/sp.j.1123.2010.00529] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Liquid chromatography-mass spectrometry (LC-MS) is the mainstream of high throughput protein identification technology. Peptide retention time in reversed-phase liquid chromatography (RPLC) is mainly determined by the physicochemical properties of the peptide and the LC conditions (stationary phase and mobile phase). Retention time can be predicted by analyzing these properties and quantifying their effects on peptide chromatographic behavior. Prediction of peptide retention time in LC can be used to improve identification of peptides and post translational modifications (PTM). There are mainly two methods to predict retention time: i.e., retention coefficients and machine learning. The coefficient of determination between observed and predicted retention times can reach 0.93. With the development of LC-MS technology, retention time prediction will become an important tool to facilitate protein identification.
Collapse
Affiliation(s)
- Chao Liu
- Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | | | | | | | | | | | | | | |
Collapse
|
8
|
Alpert AJ, Petritis K, Kangas L, Smith RD, Mechtler K, Mitulović G, Mohammed S, Heck AJR. Peptide orientation affects selectivity in ion-exchange chromatography. Anal Chem 2010; 82:5253-9. [PMID: 20481592 PMCID: PMC2884984 DOI: 10.1021/ac100651k] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Here we demonstrate that separation of proteolytic peptides, having the same net charge and one basic residue, is affected by their specific orientation toward the stationary phase in ion-exchange chromatography. In electrostatic repulsion-hydrophilic interaction chromatography (ERLIC) with an anion-exchange material, the C-terminus of the peptides is, on average, oriented toward the stationary phase. In cation exchange, the average peptide orientation is the opposite. Data with synthetic peptides, serving as orientation probes, indicate that in tryptic/Lys-C peptides the C-terminal carboxyl group appears to be in a zwitterionic bond with the side chain of the C-terminal Lys/Arg residue. In effect, the side chain is then less basic than the N-terminus, accounting for the specific orientation of tryptic and Lys-C peptides. Analyses of larger sets of peptides, generated from lysates by either Lys-N, Lys-C, or trypsin, reveal that specific peptide orientation affects the ability of charged side chains, such as phosphate residues, to influence retention. Phosphorylated residues that are remote in the sequence from the binding site affect retention less than those that are closer. When a peptide contains multiple charged sites, then orientation is observed to be less rigid and retention tends to be governed by the peptide's net charge rather than its sequence. These general observations could be of value in confirming a peptide's identification and, in particular, phosphosite assignments in proteomics analyses. More generally, orientation accounts for the ability of chromatography to separate peptides of the same composition but different sequence.
Collapse
Affiliation(s)
- Andrew J Alpert
- PolyLC Inc., 9151 Rumsey Road, Ste. 180, Columbia, Maryland 21045, USA.
| | | | | | | | | | | | | | | |
Collapse
|
9
|
Harscoat-Schiavo C, Raminosoa F, Ronat-Heit E, Vanderesse R, Marc I. Modeling the separation of small peptides by cation-exchange chromatography. J Sep Sci 2010; 33:2447-57. [DOI: 10.1002/jssc.201000112] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
10
|
New ammunition for the proteomic reactor: strong anion exchange beads and multiple enzymes enhance protein identification and sequence coverage. Anal Bioanal Chem 2010; 397:3421-30. [PMID: 20517600 DOI: 10.1007/s00216-010-3791-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2010] [Revised: 04/21/2010] [Accepted: 04/24/2010] [Indexed: 01/06/2023]
Abstract
The enrichment and processing of proteomic samples prior to multi-dimensional chromatography remain a challenge in 'gel-free' proteomics. We previously reported the development of a microfluidic device called the "proteomic reactor" that relied on enriching proteins by using strong cation exchange (SCX) followed by trypsin digestion in an interstitial volume as little as 50 nL. Here, we report a novel proteomic reactor that is based on polymeric strong anion exchange (SAX) material to analyse proteomic samples. We also compare the performance of the SAX proteomic reactor to our previously reported SCX proteomic reactor for analysing complex yeast proteomes. Our results indicate that the SAX protein reactor preferentially identifies more acidic peptides and proteins compared to the SCX reactor. We show that the SAX and SCX reactors are complementary and that their combination increases the number of unique peptides and proteins identified by 50%. Furthermore, we show that the number of protein identified can be increased further by up to 40% using different proteolytic enzymes on the proteomic reactor.
Collapse
|
11
|
Wang B, Valentine S, Plasencia M, Raghuraman S, Zhang X. Artificial neural networks for the prediction of peptide drift time in ion mobility mass spectrometry. BMC Bioinformatics 2010; 11:182. [PMID: 20380738 PMCID: PMC2874804 DOI: 10.1186/1471-2105-11-182] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2009] [Accepted: 04/11/2010] [Indexed: 11/10/2022] Open
Abstract
Background There is an increasing usage of ion mobility-mass spectrometry (IMMS) in proteomics. IMMS combines the features of ion mobility spectrometry (IMS) and mass spectrometry (MS). It separates and detects peptide ions on a millisecond time-scale. IMS separates peptide ions based on drift time that is determined by the collision cross-section of each peptide ion in a given experiment condition. A peptide ion's collision cross-section is related to the ion size and shape resulted from the peptide amino acid sequence and their modifications. This inherent relation between the drift time of peptide ion and peptide sequence indicates that the drift time of peptide ions can be used to infer peptide sequence and therefore, for peptide identification. Results This paper describes an artificial neural networks (ANNs) regression model for the prediction of peptide ion drift time in IMMS. Each peptide in this work was represented using three descriptors (i.e., molecular weight, sequence length and a two-dimensional sequence index). An ANN predictor consisting of four input nodes, three hidden nodes and one output node was constructed for peptide ion drift time prediction. For the model training and testing, a 10-fold cross-validation strategy was employed for three datasets each containing different charge states. Dataset one contains 212 singly-charged peptide ions, dataset two has 306 doubly-charged peptide ions, and dataset three has 77 triply-charged peptide ions. Our proposed method achieved 94.4%, 93.6% and 74.2% prediction accuracy for singly-, doubly- and triply-charged peptide ions, respectively. Conclusions An ANN-based method has been developed for predicting the drift time of peptide ions in IMMS. The results achieved here demonstrate the effectiveness and efficiency of the prediction model. This work can enhance the confidence of protein identification by combining with current database search approaches for protein identification.
Collapse
Affiliation(s)
- Bing Wang
- Department of Electronics and Information Engineering, Anhui University of Technology, Ma'anshan, 243002, China.
| | | | | | | | | |
Collapse
|
12
|
Wang B, Valentine S, Raghuraman S, Plasencia M, Zhang X. Abstracts of UT-ORNL-KBRIN (University of Tennessee-Oak Ridge National Laboratory-Kentucky Bioinformatics Network) Bioinformatics Summit 2009. Pikeville, Tennessee, USA. March 20-22, 2009. BMC Bioinformatics 2009; 10 Suppl 7:A1-18. [PMID: 19735588 PMCID: PMC3313255 DOI: 10.1186/1471-2105-10-s7-a1] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
|
13
|
Bączek T, Kaliszan R. Predictions of peptides' retention times in reversed-phase liquid chromatography as a new supportive tool to improve protein identification in proteomics. Proteomics 2009; 9:835-47. [DOI: 10.1002/pmic.200800544] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
14
|
Ning ZB, Li QR, Dai J, Li RX, Shieh CH, Zeng R. Fractionation of Complex Protein Mixture by Virtual Three-Dimensional Liquid Chromatography Based on Combined pH and Salt Steps. J Proteome Res 2008; 7:4525-37. [DOI: 10.1021/pr800318j] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Affiliation(s)
- Zhi-Bin Ning
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Science, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Qing-Run Li
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Science, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Jie Dai
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Science, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Rong-Xia Li
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Science, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Chia-Hui Shieh
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Science, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Rong Zeng
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Science, Chinese Academy of Sciences, Shanghai, 200031, China
| |
Collapse
|
15
|
Pfeifer N, Leinenbach A, Huber CG, Kohlbacher O. Statistical learning of peptide retention behavior in chromatographic separations: a new kernel-based approach for computational proteomics. BMC Bioinformatics 2007; 8:468. [PMID: 18053132 PMCID: PMC2254445 DOI: 10.1186/1471-2105-8-468] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2007] [Accepted: 11/30/2007] [Indexed: 12/03/2022] Open
Abstract
Background High-throughput peptide and protein identification technologies have benefited tremendously from strategies based on tandem mass spectrometry (MS/MS) in combination with database searching algorithms. A major problem with existing methods lies within the significant number of false positive and false negative annotations. So far, standard algorithms for protein identification do not use the information gained from separation processes usually involved in peptide analysis, such as retention time information, which are readily available from chromatographic separation of the sample. Identification can thus be improved by comparing measured retention times to predicted retention times. Current prediction models are derived from a set of measured test analytes but they usually require large amounts of training data. Results We introduce a new kernel function which can be applied in combination with support vector machines to a wide range of computational proteomics problems. We show the performance of this new approach by applying it to the prediction of peptide adsorption/elution behavior in strong anion-exchange solid-phase extraction (SAX-SPE) and ion-pair reversed-phase high-performance liquid chromatography (IP-RP-HPLC). Furthermore, the predicted retention times are used to improve spectrum identifications by a p-value-based filtering approach. The approach was tested on a number of different datasets and shows excellent performance while requiring only very small training sets (about 40 peptides instead of thousands). Using the retention time predictor in our retention time filter improves the fraction of correctly identified peptide mass spectra significantly. Conclusion The proposed kernel function is well-suited for the prediction of chromatographic separation in computational proteomics and requires only a limited amount of training data. The performance of this new method is demonstrated by applying it to peptide retention time prediction in IP-RP-HPLC and prediction of peptide sample fractionation in SAX-SPE. Finally, we incorporate the predicted chromatographic behavior in a p-value based filter to improve peptide identifications based on liquid chromatography-tandem mass spectrometry.
Collapse
Affiliation(s)
- Nico Pfeifer
- Division for Simulation of Biological Systems, Center for Bioinformatics, Eberhard-Karls University, 72076 Tübingen, Germany.
| | | | | | | |
Collapse
|