Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Huang T, He Z. A linear programming model for protein inference problem in shotgun proteomics. ACTA ACUST UNITED AC 2012;28:2956-62. [PMID: 22954624 DOI: 10.1093/bioinformatics/bts540] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]

For:	Huang T, He Z. A linear programming model for protein inference problem in shotgun proteomics. ACTA ACUST UNITED AC 2012;28:2956-62. [PMID: 22954624 DOI: 10.1093/bioinformatics/bts540] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]

Number

Cited by Other Article(s)

Feng S, Ji HL, Wang H, Zhang B, Sterzenbach R, Pan C, Guo X. MetaLP: An integrative linear programming method for protein inference in metaproteomics. PLoS Comput Biol 2022;18:e1010603. [PMID: 36269761 PMCID: PMC9629623 DOI: 10.1371/journal.pcbi.1010603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2022] [Revised: 11/02/2022] [Accepted: 09/26/2022] [Indexed: 11/07/2022] Open

Fancello L, Burger T. An analysis of proteogenomics and how and when transcriptome-informed reduction of protein databases can enhance eukaryotic proteomics. Genome Biol 2022;23:132. [PMID: 35725496 PMCID: PMC9208142 DOI: 10.1186/s13059-022-02701-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Accepted: 06/09/2022] [Indexed: 12/03/2022] Open

Abstract

BACKGROUND

Proteogenomics aims to identify variant or unknown proteins in bottom-up proteomics, by searching transcriptome- or genome-derived custom protein databases. However, empirical observations reveal that these large proteogenomic databases produce lower-sensitivity peptide identifications. Various strategies have been proposed to avoid this, including the generation of reduced transcriptome-informed protein databases, which only contain proteins whose transcripts are detected in the sample-matched transcriptome. These were found to increase peptide identification sensitivity. Here, we present a detailed evaluation of this approach.

RESULTS

We establish that the increased sensitivity in peptide identification is in fact a statistical artifact, directly resulting from the limited capability of target-decoy competition to accurately model incorrect target matches when using excessively small databases. As anti-conservative false discovery rates (FDRs) are likely to hamper the robustness of the resulting biological conclusions, we advocate for alternative FDR control methods that are less sensitive to database size. Nevertheless, reduced transcriptome-informed databases are useful, as they reduce the ambiguity of protein identifications, yielding fewer shared peptides. Furthermore, searching the reference database and subsequently filtering proteins whose transcripts are not expressed reduces protein identification ambiguity to a similar extent, but is more transparent and reproducible.

CONCLUSIONS

In summary, using transcriptome information is an interesting strategy that has not been promoted for the right reasons. While the increase in peptide identifications from searching reduced transcriptome-informed databases is an artifact caused by the use of an FDR control method unsuitable to excessively small databases, transcriptome information can reduce the ambiguity of protein identifications.

Collapse

Winkler R. ProtyQuant: Comparing label-free shotgun proteomics datasets using accumulated peptide probabilities. J Proteomics 2020;230:103985. [PMID: 32956841 DOI: 10.1016/j.jprot.2020.103985] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Revised: 08/07/2020] [Accepted: 09/10/2020] [Indexed: 11/20/2022]

Abstract

Comparing multiple label-free shotgun proteomics datasets requires various data processing and formatting steps, including peptide-spectrum matching, protein inference, and quantification. Finally, the compilation of results files into a format that allows for downstream analyses. ProtyQuant performs protein inference and quantification calculations, and combines the results of individual datasets into plain text tables. These are lightweight, human-readable, and easy to import into databases or statistical software. ProtyQuant reads validated pepXML from proteomic workflows such as the Trans-Proteomic Pipeline (TPP), which makes it compatible with many commercial and free search engines. For protein inference and quantification, a modified version of the PIPQ program (He et al. 2016) was integrated. In contrast to simple spectral-counting, PIPQ sums up peptide probabilities. For assigning peptides to proteins, three algorithms are available: Multiple Counting, Equal Division, and Linear Programming. The accumulated peptide probabilities (app) are used for both tasks, protein probability estimation, and quantification. ProtyQuant was tested using a reference dataset for label-free shotgun proteomics, obtained from different concentrations of 48 human UPS proteins spiked into yeast lysate. Compared to ProteinProphet, ProtyQuant detected up to 126 (15%) more proteins in the mixture, applying an equal false positive rate (FPR). Using the app values for label-free quantification showed suitable sensitivity and linearity. Strikingly, the app values represent a realistic measure of 'Protein Presence,' an integral concept of protein probability and quantity. ProtyQuant provides a graphical user interface (GUI) and scripts for console-based processing. It is available (GNU GLP v3) for Windows, Linux, and Docker from https://bitbucket.org/lababi/protyquant/. SIGNIFICANCE: Integrating data from multiple shot-gun proteomics experiments overwhelms non-expert researchers. ProtyQuant complements well-established workflows by aiding the comparison of proteins across samples. Importantly, the probability and abundance of proteins are seen from a holistic point of view. The accumulated peptide probability (app) as an integral measure of 'Protein Presence' demonstrated reliable performance for both protein identification and quantification. Using the app as a single measure facilitates the compilation of reports in comparative proteomics.

Collapse

Prieto G, Vázquez J. Protein Probability Model for High-Throughput Protein Identification by Mass Spectrometry-Based Proteomics. J Proteome Res 2020;19:1285-1297. [PMID: 32037837 DOI: 10.1021/acs.jproteome.9b00819] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Zhong J, Wang J, Ding X, Zhang Z, Li M, Wu FX, Pan Y. Protein Inference from the Integration of Tandem MS Data and Interactome Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017;14:1399-1409. [PMID: 28113634 DOI: 10.1109/tcbb.2016.2601618] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]

Abstract

Since proteins are digested into a mixture of peptides in the preprocessing step of tandem mass spectrometry (MS), it is difficult to determine which specific protein a shared peptide belongs to. In recent studies, besides tandem MS data and peptide identification information, some other information is exploited to infer proteins. Different from the methods which first use only tandem MS data to infer proteins and then use network information to refine them, this study proposes a protein inference method named TMSIN, which uses interactome networks directly. As two interacting proteins should co-exist, it is reasonable to assume that if one of the interacting proteins is confidently inferred in a sample, its interacting partners should have a high probability in the same sample, too. Therefore, we can use the neighborhood information of a protein in an interactome network to adjust the probability that the shared peptide belongs to the protein. In TMSIN, a multi-weighted graph is constructed by incorporating the bipartite graph with interactome network information, where the bipartite graph is built with the peptide identification information. Based on multi-weighted graphs, TMSIN adopts an iterative workflow to infer proteins. At each iterative step, the probability that a shared peptide belongs to a specific protein is calculated by using the Bayes' law based on the neighbor protein support scores of each protein which are mapped by the shared peptides. We carried out experiments on yeast data and human data to evaluate the performance of TMSIN in terms of ROC, q-value, and accuracy. The experimental results show that AUC scores yielded by TMSIN are 0.742 and 0.874 in yeast dataset and human dataset, respectively, and TMSIN yields the maximum number of true positives when q-value less than or equal to 0.05. The overlap analysis shows that TMSIN is an effective complementary approach for protein inference.

Collapse

Kim M, Eetemadi A, Tagkopoulos I. DeepPep: Deep proteome inference from peptide profiles. PLoS Comput Biol 2017;13:e1005661. [PMID: 28873403 PMCID: PMC5600403 DOI: 10.1371/journal.pcbi.1005661] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2017] [Revised: 09/15/2017] [Accepted: 06/27/2017] [Indexed: 11/24/2022] Open

Abstract

Protein inference, the identification of the protein set that is the origin of a given peptide profile, is a fundamental challenge in proteomics. We present DeepPep, a deep-convolutional neural network framework that predicts the protein set from a proteomics mixture, given the sequence universe of possible proteins and a target peptide profile. In its core, DeepPep quantifies the change in probabilistic score of peptide-spectrum matches in the presence or absence of a specific protein, hence selecting as candidate proteins with the largest impact to the peptide profile. Application of the method across datasets argues for its competitive predictive ability (AUC of 0.80±0.18, AUPR of 0.84±0.28) in inferring proteins without need of peptide detectability on which the most competitive methods rely. We find that the convolutional neural network architecture outperforms the traditional artificial neural network architectures without convolution layers in protein inference. We expect that similar deep learning architectures that allow learning nonlinear patterns can be further extended to problems in metagenome profiling and cell type inference. The source code of DeepPep and the benchmark datasets used in this study are available at https://deeppep.github.io/DeepPep/.

The accurate identification of proteins in a proteomics sample, called the protein inference problem, is a fundamental challenge in biomedical sciences. Current approaches are based on applications of traditional neural networks, linear optimization and Bayesian techniques. We here present DeepPep, a deep-convolutional neural network framework that predicts the protein set from a standard proteomics mixture, given all protein sequences and a peptide profile. Comparison to leading methods shows that DeepPep has most robust performance with various instruments and datasets. Our results provide evidence that using sequence-level location information of a peptide in the context of proteome sequence can result in more accurate and robust protein inference. We conclude that Deep Learning on protein sequence leads to superior platforms for protein inference that can be further refined with additional features and extended for far reaching applications.

Collapse

Kepplinger D, Takhar M, Sasaki M, Hollander Z, Smith D, McManus B, McMaster WR, Ng RT, Cohen Freue GV. PGCA: An algorithm to link protein groups created from MS/MS data. PLoS One 2017;12:e0177569. [PMID: 28562641 PMCID: PMC5451011 DOI: 10.1371/journal.pone.0177569] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2016] [Accepted: 04/28/2017] [Indexed: 11/19/2022] Open

Abstract

The quantitation of proteins using shotgun proteomics has gained popularity in the last decades, simplifying sample handling procedures, removing extensive protein separation steps and achieving a relatively high throughput readout. The process starts with the digestion of the protein mixture into peptides, which are then separated by liquid chromatography and sequenced by tandem mass spectrometry (MS/MS). At the end of the workflow, recovering the identity of the proteins originally present in the sample is often a difficult and ambiguous process, because more than one protein identifier may match a set of peptides identified from the MS/MS spectra. To address this identification problem, many MS/MS data processing software tools combine all plausible protein identifiers matching a common set of peptides into a protein group. However, this solution introduces new challenges in studies with multiple experimental runs, which can be characterized by three main factors: i) protein groups' identifiers are local, i.e., they vary run to run, ii) the composition of each group may change across runs, and iii) the supporting evidence of proteins within each group may also change across runs. Since in general there is no conclusive evidence about the absence of proteins in the groups, protein groups need to be linked across different runs in subsequent statistical analyses. We propose an algorithm, called Protein Group Code Algorithm (PGCA), to link groups from multiple experimental runs by forming global protein groups from connected local groups. The algorithm is computationally inexpensive and enables the connection and analysis of lists of protein groups across runs needed in biomarkers studies. We illustrate the identification problem and the stability of the PGCA mapping using 65 iTRAQ experimental runs. Further, we use two biomarker studies to show how PGCA enables the discovery of relevant candidate protein group markers with similar but non-identical compositions in different runs.

Collapse

Audain E, Uszkoreit J, Sachsenberg T, Pfeuffer J, Liang X, Hermjakob H, Sanchez A, Eisenacher M, Reinert K, Tabb DL, Kohlbacher O, Perez-Riverol Y. In-depth analysis of protein inference algorithms using multiple search engines and well-defined metrics. J Proteomics 2017;150:170-182. [DOI: 10.1016/j.jprot.2016.08.002] [Citation(s) in RCA: 47] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2016] [Revised: 07/30/2016] [Accepted: 08/02/2016] [Indexed: 12/24/2022]

Protein inference: A protein quantification perspective. Comput Biol Chem 2016;63:21-29. [DOI: 10.1016/j.compbiolchem.2016.02.006] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2016] [Accepted: 02/01/2016] [Indexed: 01/04/2023]

Computational Methods in Mass Spectrometry-Based Proteomics. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2016;939:63-89. [PMID: 27807744 DOI: 10.1007/978-981-10-1503-8_4] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

He Z, Huang T, Zhao C, Teng B. Protein Inference. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2016;919:237-242. [PMID: 27975221 DOI: 10.1007/978-3-319-41448-5_12] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]

Zhao C, Liu D, Teng B, He Z. BagReg: Protein inference through machine learning. Comput Biol Chem 2015;57:12-20. [DOI: 10.1016/j.compbiolchem.2015.02.009] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2014] [Accepted: 02/03/2015] [Indexed: 10/24/2022]

Kelchtermans P, Bittremieux W, De Grave K, Degroeve S, Ramon J, Laukens K, Valkenborg D, Barsnes H, Martens L. Machine learning applications in proteomics research: how the past can boost the future. Proteomics 2014;14:353-66. [PMID: 24323524 DOI: 10.1002/pmic.201300289] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2013] [Revised: 09/24/2013] [Accepted: 10/14/2013] [Indexed: 01/22/2023]

Teng B, Huang T, He Z. Decoy-free protein-level false discovery rate estimation. ACTA ACUST UNITED AC 2013;30:675-81. [PMID: 23926225 DOI: 10.1093/bioinformatics/btt431] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Serang O. Concerning the accuracy of Fido and parameter choice. Bioinformatics 2012. [PMID: 23193221 DOI: 10.1093/bioinformatics/bts687] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open