1
|
Omenn GS, Lane L, Overall CM, Lindskog C, Pineau C, Packer NH, Cristea IM, Weintraub ST, Orchard S, Roehrl MHA, Nice E, Guo T, Van Eyk JE, Liu S, Bandeira N, Aebersold R, Moritz RL, Deutsch EW. The 2023 Report on the Proteome from the HUPO Human Proteome Project. J Proteome Res 2024; 23:532-549. [PMID: 38232391 PMCID: PMC11026053 DOI: 10.1021/acs.jproteome.3c00591] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2024]
Abstract
Since 2010, the Human Proteome Project (HPP), the flagship initiative of the Human Proteome Organization (HUPO), has pursued two goals: (1) to credibly identify the protein parts list and (2) to make proteomics an integral part of multiomics studies of human health and disease. The HPP relies on international collaboration, data sharing, standardized reanalysis of MS data sets by PeptideAtlas and MassIVE-KB using HPP Guidelines for quality assurance, integration and curation of MS and non-MS protein data by neXtProt, plus extensive use of antibody profiling carried out by the Human Protein Atlas. According to the neXtProt release 2023-04-18, protein expression has now been credibly detected (PE1) for 18,397 of the 19,778 neXtProt predicted proteins coded in the human genome (93%). Of these PE1 proteins, 17,453 were detected with mass spectrometry (MS) in accordance with HPP Guidelines and 944 by a variety of non-MS methods. The number of neXtProt PE2, PE3, and PE4 missing proteins now stands at 1381. Achieving the unambiguous identification of 93% of predicted proteins encoded from across all chromosomes represents remarkable experimental progress on the Human Proteome parts list. Meanwhile, there are several categories of predicted proteins that have proved resistant to detection regardless of protein-based methods used. Additionally there are some PE1-4 proteins that probably should be reclassified to PE5, specifically 21 LINC entries and ∼30 HERV entries; these are being addressed in the present year. Applying proteomics in a wide array of biological and clinical studies ensures integration with other omics platforms as reported by the Biology and Disease-driven HPP teams and the antibody and pathology resource pillars. Current progress has positioned the HPP to transition to its Grand Challenge Project focused on determining the primary function(s) of every protein itself and in networks and pathways within the context of human health and disease.
Collapse
Affiliation(s)
- Gilbert S. Omenn
- University of Michigan, Ann Arbor, Michigan 48109, United States
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Lydie Lane
- CALIPHO Group, SIB Swiss Institute of Bioinformatics and University of Geneva, 1015 Lausanne, Switzerland
| | - Christopher M. Overall
- University of British Columbia, Vancouver, BC V6T 1Z4, Canada, Yonsei University Republic of Korea
| | | | - Charles Pineau
- University Rennes, Inserm U1085, Irset, 35042 Rennes, France
| | | | | | - Susan T. Weintraub
- University of Texas Health Science Center-San Antonio, San Antonio, Texas 78229-3900, United States
| | | | - Michael H. A. Roehrl
- Department of Pathology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02215, United States
| | | | - Tiannan Guo
- Westlake Center for Intelligent Proteomics, Westlake Laboratory, Westlake University, Hangzhou 310024, Zhejiang Province, China
| | - Jennifer E. Van Eyk
- Advanced Clinical Biosystems Research Institute, Smidt Heart Institute, Cedars-Sinai Medical Center, 127 South San Vicente Boulevard, Pavilion, 9th Floor, Los Angeles, CA, 90048, United States
| | - Siqi Liu
- BGI Group, Shenzhen 518083, China
| | - Nuno Bandeira
- University of California, San Diego, La Jolla, CA, 92093, United States
| | - Ruedi Aebersold
- Institute of Molecular Systems Biology in ETH Zurich, 8092 Zurich, Switzerland
- University of Zurich, 8092 Zurich, Switzerland
| | - Robert L. Moritz
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Eric W. Deutsch
- Institute for Systems Biology, Seattle, Washington 98109, United States
| |
Collapse
|
2
|
van Wijk KJ, Leppert T, Sun Z, Kearly A, Li M, Mendoza L, Guzchenko I, Debley E, Sauermann G, Routray P, Malhotra S, Nelson A, Sun Q, Deutsch EW. Detection of the Arabidopsis Proteome and Its Post-translational Modifications and the Nature of the Unobserved (Dark) Proteome in PeptideAtlas. J Proteome Res 2024; 23:185-214. [PMID: 38104260 DOI: 10.1021/acs.jproteome.3c00536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]
Abstract
This study describes a new release of the Arabidopsis thaliana PeptideAtlas proteomics resource (build 2023-10) providing protein sequence coverage, matched mass spectrometry (MS) spectra, selected post-translational modifications (PTMs), and metadata. 70 million MS/MS spectra were matched to the Araport11 annotation, identifying ∼0.6 million unique peptides and 18,267 proteins at the highest confidence level and 3396 lower confidence proteins, together representing 78.6% of the predicted proteome. Additional identified proteins not predicted in Araport11 should be considered for the next Arabidopsis genome annotation. This release identified 5198 phosphorylated proteins, 668 ubiquitinated proteins, 3050 N-terminally acetylated proteins, and 864 lysine-acetylated proteins and mapped their PTM sites. MS support was lacking for 21.4% (5896 proteins) of the predicted Araport11 proteome: the "dark" proteome. This dark proteome is highly enriched for E3 ligases, transcription factors, and for certain (e.g., CLE, IDA, PSY) but not other (e.g., THIONIN, CAP) signaling peptides families. A machine learning model trained on RNA expression data and protein properties predicts the probability that proteins will be detected. The model aids in discovery of proteins with short half-life (e.g., SIG1,3 and ERF-VII TFs) and for developing strategies to identify the missing proteins. PeptideAtlas is linked to TAIR, tracks in JBrowse, and several other community proteomics resources.
Collapse
Affiliation(s)
- Klaas J van Wijk
- Section of Plant Biology, School of Integrative Plant Sciences (SIPS), Cornell University, Ithaca, New York 14853, United States
| | - Tami Leppert
- Institute for Systems Biology (ISB), Seattle, Washington 98109, United States
| | - Zhi Sun
- Institute for Systems Biology (ISB), Seattle, Washington 98109, United States
| | - Alyssa Kearly
- Boyce Thompson Institute, Ithaca, New York 14853, United States
| | - Margaret Li
- Institute for Systems Biology (ISB), Seattle, Washington 98109, United States
| | - Luis Mendoza
- Institute for Systems Biology (ISB), Seattle, Washington 98109, United States
| | - Isabell Guzchenko
- Section of Plant Biology, School of Integrative Plant Sciences (SIPS), Cornell University, Ithaca, New York 14853, United States
| | - Erica Debley
- Section of Plant Biology, School of Integrative Plant Sciences (SIPS), Cornell University, Ithaca, New York 14853, United States
| | - Georgia Sauermann
- Section of Plant Biology, School of Integrative Plant Sciences (SIPS), Cornell University, Ithaca, New York 14853, United States
| | - Pratyush Routray
- Section of Plant Biology, School of Integrative Plant Sciences (SIPS), Cornell University, Ithaca, New York 14853, United States
| | - Sagunya Malhotra
- Institute for Systems Biology (ISB), Seattle, Washington 98109, United States
| | - Andrew Nelson
- Boyce Thompson Institute, Ithaca, New York 14853, United States
| | - Qi Sun
- Computational Biology Service Unit, Cornell University, Ithaca, New York 14853, United States
| | - Eric W Deutsch
- Institute for Systems Biology (ISB), Seattle, Washington 98109, United States
| |
Collapse
|
3
|
Omenn GS, Lane L, Overall CM, Pineau C, Packer NH, Cristea IM, Lindskog C, Weintraub ST, Orchard S, Roehrl MHA, Nice E, Liu S, Bandeira N, Chen YJ, Guo T, Aebersold R, Moritz RL, Deutsch EW. The 2022 Report on the Human Proteome from the HUPO Human Proteome Project. J Proteome Res 2022; 22:1024-1042. [PMID: 36318223 PMCID: PMC10081950 DOI: 10.1021/acs.jproteome.2c00498] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The 2022 Metrics of the Human Proteome from the HUPO Human Proteome Project (HPP) show that protein expression has now been credibly detected (neXtProt PE1 level) for 18 407 (93.2%) of the 19 750 predicted proteins coded in the human genome, a net gain of 50 since 2021 from data sets generated around the world and reanalyzed by the HPP. Conversely, the number of neXtProt PE2, PE3, and PE4 missing proteins has been reduced by 78 from 1421 to 1343. This represents continuing experimental progress on the human proteome parts list across all the chromosomes, as well as significant reclassifications. Meanwhile, applying proteomics in a vast array of biological and clinical studies continues to yield significant findings and growing integration with other omics platforms. We present highlights from the Chromosome-Centric HPP, Biology and Disease-driven HPP, and HPP Resource Pillars, compare features of mass spectrometry and Olink and Somalogic platforms, note the emergence of translation products from ribosome profiling of small open reading frames, and discuss the launch of the initial HPP Grand Challenge Project, "A Function for Each Protein".
Collapse
Affiliation(s)
- Gilbert S Omenn
- University of Michigan, Ann Arbor, Michigan48109, United States.,Institute for Systems Biology, Seattle, Washington98109, United States
| | - Lydie Lane
- CALIPHO Group, SIB Swiss Institute of Bioinformatics and University of Geneva, 1015Lausanne, Switzerland
| | | | - Charles Pineau
- French Institute of Health and Medical Research, 35042RENNESCedexFrance
| | - Nicolle H Packer
- Macquarie University, Sydney, New South Wales2109, Australia.,Griffith University's Institute for Glycomics, Sydney, New South Wales2109, Australia
| | | | | | - Susan T Weintraub
- University of Texas Health Science Center-San Antonio, San Antonio, Texas78229-3900, United States
| | - Sandra Orchard
- EMBL-EBI, Hinxton, CambridgeshireCB10 1SD, United Kingdom
| | - Michael H A Roehrl
- Memorial Sloan Kettering Cancer Center, New York, New York10065, United States
| | - Edouard Nice
- Monash University, ClaytonVictoria3800, Australia
| | - Siqi Liu
- BGI Group, Shenzhen518083, P. R. China
| | - Nuno Bandeira
- University of California, San Diego, La Jolla, California92093, United States
| | - Yu-Ju Chen
- National Taiwan University, Academia Sinica, Nankang, Taipei11529, Taiwan
| | - Tiannan Guo
- Westlake University Guomics Laboratory of Big Proteomic Data, Hangzhou310024, ZhejiangProvinceP. R. China
| | - Ruedi Aebersold
- Institute of Molecular Systems Biology in ETH Zurich, 8092Zurich, Switzerland
| | - Robert L Moritz
- Institute for Systems Biology, Seattle, Washington98109, United States
| | - Eric W Deutsch
- Institute for Systems Biology, Seattle, Washington98109, United States
| |
Collapse
|
4
|
Kalyuzhnyy A, Eyers PA, Eyers CE, Bowler-Barnett E, Martin MJ, Sun Z, Deutsch EW, Jones AR. Profiling the Human Phosphoproteome to Estimate the True Extent of Protein Phosphorylation. J Proteome Res 2022; 21:1510-1524. [PMID: 35532924 PMCID: PMC9171898 DOI: 10.1021/acs.jproteome.2c00131] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Public phosphorylation databases such as PhosphoSitePlus (PSP) and PeptideAtlas (PA) compile results from published papers or openly available mass spectrometry (MS) data. However, there is no database-level control for false discovery of sites, likely leading to the overestimation of true phosphosites. By profiling the human phosphoproteome, we estimate the false discovery rate (FDR) of phosphosites and predict a more realistic count of true identifications. We rank sites into phosphorylation likelihood sets and analyze them in terms of conservation across 100 species, sequence properties, and functional annotations. We demonstrate significant differences between the sets and develop a method for independent phosphosite FDR estimation. Remarkably, we report estimated FDRs of 84, 98, and 82% within sets of phosphoserine (pSer), phosphothreonine (pThr), and phosphotyrosine (pTyr) sites, respectively, that are supported by only a single piece of identification evidence─the majority of sites in PSP. We estimate that around 62 000 Ser, 8000 Thr, and 12 000 Tyr phosphosites in the human proteome are likely to be true, which is lower than most published estimates. Furthermore, our analysis estimates that 86 000 Ser, 50 000 Thr, and 26 000 Tyr phosphosites are likely false-positive identifications, highlighting the significant potential of false-positive data to be present in phosphorylation databases.
Collapse
Affiliation(s)
- Anton Kalyuzhnyy
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7BE, U.K.,Computational Biology Facility, Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7BE, U.K
| | - Patrick A Eyers
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7BE, U.K
| | - Claire E Eyers
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7BE, U.K.,Centre for Proteome Research, Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7BE, U.K
| | - Emily Bowler-Barnett
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge CB10 1SD, U.K
| | - Maria J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge CB10 1SD, U.K
| | - Zhi Sun
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Eric W Deutsch
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Andrew R Jones
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7BE, U.K.,Computational Biology Facility, Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7BE, U.K
| |
Collapse
|
5
|
Deutsch EW, Omenn GS, Sun Z, Maes M, Pernemalm M, Palaniappan KK, Letunica N, Vandenbrouck Y, Brun V, Tao SC, Yu X, Geyer PE, Ignjatovic V, Moritz RL, Schwenk JM. Advances and Utility of the Human Plasma Proteome. J Proteome Res 2021; 20:5241-5263. [PMID: 34672606 PMCID: PMC9469506 DOI: 10.1021/acs.jproteome.1c00657] [Citation(s) in RCA: 65] [Impact Index Per Article: 21.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
The study of proteins circulating in blood offers tremendous opportunities to diagnose, stratify, or possibly prevent diseases. With recent technological advances and the urgent need to understand the effects of COVID-19, the proteomic analysis of blood-derived serum and plasma has become even more important for studying human biology and pathophysiology. Here we provide views and perspectives about technological developments and possible clinical applications that use mass-spectrometry(MS)- or affinity-based methods. We discuss examples where plasma proteomics contributed valuable insights into SARS-CoV-2 infections, aging, and hemostasis and the opportunities offered by combining proteomics with genetic data. As a contribution to the Human Proteome Organization (HUPO) Human Plasma Proteome Project (HPPP), we present the Human Plasma PeptideAtlas build 2021-07 that comprises 4395 canonical and 1482 additional nonredundant human proteins detected in 240 MS-based experiments. In addition, we report the new Human Extracellular Vesicle PeptideAtlas 2021-06, which comprises five studies and 2757 canonical proteins detected in extracellular vesicles circulating in blood, of which 74% (2047) are in common with the plasma PeptideAtlas. Our overview summarizes the recent advances, impactful applications, and ongoing challenges for translating plasma proteomics into utility for precision medicine.
Collapse
Affiliation(s)
- Eric W Deutsch
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Gilbert S Omenn
- Institute for Systems Biology, Seattle, Washington 98109, United States.,Departments of Computational Medicine & Bioinformatics, Internal Medicine, and Human Genetics and School of Public Health, University of Michigan, Ann Arbor, Michigan 48109-2218, United States
| | - Zhi Sun
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Michal Maes
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Maria Pernemalm
- Department of Oncology and Pathology/Science for Life Laboratory, Karolinska Institutet, 171 65 Stockholm, Sweden
| | | | - Natasha Letunica
- Murdoch Children's Research Institute, 50 Flemington Road, Parkville 3052, Victoria, Australia
| | - Yves Vandenbrouck
- Université Grenoble Alpes, CEA, Inserm U1292, Grenoble 38000, France
| | - Virginie Brun
- Université Grenoble Alpes, CEA, Inserm U1292, Grenoble 38000, France
| | - Sheng-Ce Tao
- Key Laboratory of Systems Biomedicine (Ministry of Education), Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, B207 SCSB Building, 800 Dongchuan Road, Shanghai 200240, China
| | - Xiaobo Yu
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences-Beijing (PHOENIX Center), Beijing Institute of Lifeomics, Beijing 102206, China
| | - Philipp E Geyer
- OmicEra Diagnostics GmbH, Behringstr. 6, 82152 Planegg, Germany
| | - Vera Ignjatovic
- Murdoch Children's Research Institute, 50 Flemington Road, Parkville 3052, Victoria, Australia.,Department of Paediatrics, The University of Melbourne, 50 Flemington Road, Parkville 3052, Victoria, Australia
| | - Robert L Moritz
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Jochen M Schwenk
- Affinity Proteomics, Science for Life Laboratory, Department of Protein Science, KTH Royal Institute of Technology, Tomtebodavägen 23, SE-171 65 Solna, Sweden
| |
Collapse
|
6
|
Omenn GS, Lane L, Overall CM, Paik YK, Cristea IM, Corrales FJ, Lindskog C, Weintraub S, Roehrl MHA, Liu S, Bandeira N, Srivastava S, Chen YJ, Aebersold R, Moritz RL, Deutsch EW. Progress Identifying and Analyzing the Human Proteome: 2021 Metrics from the HUPO Human Proteome Project. J Proteome Res 2021; 20:5227-5240. [PMID: 34670092 DOI: 10.1021/acs.jproteome.1c00590] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The 2021 Metrics of the HUPO Human Proteome Project (HPP) show that protein expression has now been credibly detected (neXtProt PE1 level) for 18 357 (92.8%) of the 19 778 predicted proteins coded in the human genome, a gain of 483 since 2020 from reports throughout the world reanalyzed by the HPP. Conversely, the number of neXtProt PE2, PE3, and PE4 missing proteins has been reduced by 478 to 1421. This represents remarkable progress on the proteome parts list. The utilization of proteomics in a broad array of biological and clinical studies likewise continues to expand with many important findings and effective integration with other omics platforms. We present highlights from the Immunopeptidomics, Glycoproteomics, Infectious Disease, Cardiovascular, Musculo-Skeletal, Liver, and Cancers B/D-HPP teams and from the Knowledgebase, Mass Spectrometry, Antibody Profiling, and Pathology resource pillars, as well as ethical considerations important to the clinical utilization of proteomics and protein biomarkers.
Collapse
Affiliation(s)
- Gilbert S Omenn
- University of Michigan, Ann Arbor, Michigan 48109, United States.,Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Lydie Lane
- CALIPHO Group, SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | | | - Young-Ki Paik
- Yonsei Proteome Research Center and Yonsei University, Seoul 03722, Korea
| | - Ileana M Cristea
- Princeton University, Princeton, New Jersey 08544, United States
| | | | | | - Susan Weintraub
- University of Texas Health, San Antonio, San Antonio, Texas 78229-3900, United States
| | - Michael H A Roehrl
- Memorial Sloan Kettering Cancer Center, New York, New York 10065, United States
| | - Siqi Liu
- BGI Group, Shenzhen 518083, China
| | - Nuno Bandeira
- University of California, San Diego, La Jolla, California 92093, United States
| | | | - Yu-Ju Chen
- National Taiwan University, Academia Sinica, Nankang, Taipei 11529, Taiwan
| | - Ruedi Aebersold
- ETH-Zurich and University of Zurich, 8092 Zurich, Switzerland
| | - Robert L Moritz
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Eric W Deutsch
- Institute for Systems Biology, Seattle, Washington 98109, United States
| |
Collapse
|
7
|
Omenn GS, Lane L, Overall CM, Cristea IM, Corrales FJ, Lindskog C, Paik YK, Van Eyk JE, Liu S, Pennington SR, Snyder MP, Baker MS, Bandeira N, Aebersold R, Moritz RL, Deutsch EW. Research on the Human Proteome Reaches a Major Milestone: >90% of Predicted Human Proteins Now Credibly Detected, According to the HUPO Human Proteome Project. J Proteome Res 2020; 19:4735-4746. [PMID: 32931287 DOI: 10.1021/acs.jproteome.0c00485] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
According to the 2020 Metrics of the HUPO Human Proteome Project (HPP), expression has now been detected at the protein level for >90% of the 19 773 predicted proteins coded in the human genome. The HPP annually reports on progress made throughout the world toward credibly identifying and characterizing the complete human protein parts list and promoting proteomics as an integral part of multiomics studies in medicine and the life sciences. NeXtProt release 2020-01 classified 17 874 proteins as PE1, having strong protein-level evidence, up 180 from 17 694 one year earlier. These represent 90.4% of the 19 773 predicted coding genes (all PE1,2,3,4 proteins in neXtProt). Conversely, the number of neXtProt PE2,3,4 proteins, termed the "missing proteins" (MPs), was reduced by 230 from 2129 to 1899 since the neXtProt 2019-01 release. PeptideAtlas is the primary source of uniform reanalysis of raw mass spectrometry data for neXtProt, supplemented this year with extensive data from MassIVE. PeptideAtlas 2020-01 added 362 canonical proteins between 2019 and 2020 and MassIVE contributed 84 more, many of which converted PE1 entries based on non-MS evidence to the MS-based subgroup. The 19 Biology and Disease-driven B/D-HPP teams continue to pursue the identification of driver proteins that underlie disease states, the characterization of regulatory mechanisms controlling the functions of these proteins, their proteoforms, and their interactions, and the progression of transitions from correlation to coexpression to causal networks after system perturbations. And the Human Protein Atlas published Blood, Brain, and Metabolic Atlases.
Collapse
Affiliation(s)
- Gilbert S Omenn
- University of Michigan, Ann Arbor, Michigan 48109, United States.,Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Lydie Lane
- CALIPHO Group, SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | | | - Ileana M Cristea
- Princeton University, Princeton, New Jersey 08544, United States
| | | | | | | | | | - Siqi Liu
- BGI Group, Shenzhen 518083, China
| | | | | | - Mark S Baker
- Macquarie University, Macquarie Park, NSW 2109, Australia
| | - Nuno Bandeira
- University of California, San Diego, La Jolla, California 92093, United States
| | - Ruedi Aebersold
- ETH-Zurich and University of Zurich, 8092 Zurich, Switzerland
| | - Robert L Moritz
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Eric W Deutsch
- Institute for Systems Biology, Seattle, Washington 98109, United States
| |
Collapse
|
8
|
Omenn GS, Lane L, Overall CM, Corrales FJ, Schwenk JM, Paik YK, Van Eyk JE, Liu S, Pennington S, Snyder MP, Baker MS, Deutsch EW. Progress on Identifying and Characterizing the Human Proteome: 2019 Metrics from the HUPO Human Proteome Project. J Proteome Res 2019; 18:4098-4107. [PMID: 31430157 PMCID: PMC6898754 DOI: 10.1021/acs.jproteome.9b00434] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
The Human Proteome Project (HPP) annually reports on progress made throughout the field in credibly identifying and characterizing the complete human protein parts list and making proteomics an integral part of multiomics studies in medicine and the life sciences. NeXtProt release 2019-01-11 contains 17 694 proteins with strong protein-level evidence (PE1), compliant with HPP Guidelines for Interpretation of MS Data v2.1; these represent 89% of all 19 823 neXtProt predicted coding genes (all PE1,2,3,4 proteins), up from 17 470 one year earlier. Conversely, the number of neXtProt PE2,3,4 proteins, termed the "missing proteins" (MPs), has been reduced from 2949 to 2129 since 2016 through efforts throughout the community, including the chromosome-centric HPP. PeptideAtlas is the source of uniformly reanalyzed raw mass spectrometry data for neXtProt; PeptideAtlas added 495 canonical proteins between 2018 and 2019, especially from studies designed to detect hard-to-identify proteins. Meanwhile, the Human Protein Atlas has released version 18.1 with immunohistochemical evidence of expression of 17 000 proteins and survival plots as part of the Pathology Atlas. Many investigators apply multiplexed SRM-targeted proteomics for quantitation of organ-specific popular proteins in studies of various human diseases. The 19 teams of the Biology and Disease-driven B/D-HPP published a total of 160 publications in 2018, bringing proteomics to a broad array of biomedical research.
Collapse
Affiliation(s)
- Gilbert S. Omenn
- Department of Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw Avenue, Ann Arbor, Michigan 48109-2218, United States
- Institute for Systems Biology, 401 Terry Avenue North, Seattle, Washington 98109-5263, United States
| | - Lydie Lane
- CALIPHO Group, SIB Swiss Institute of Bioinformatics and Department of Microbiology and Molecular Medicine, Faculty of Medicine, University of Geneva, CMU, Michel-Servet 1, 1211 Geneva 4, Switzerland
| | - Christopher M. Overall
- Life Sciences Institute, Faculty of Dentistry, University of British Columbia, 2350 Health Sciences Mall, Room 4.401, Vancouver, British Columbia V6T 1Z3, Canada
| | | | - Jochen M. Schwenk
- Science for Life Laboratory, KTH Royal Institute of Technology, Tomtebodavägen 23A, 17165 Solna, Sweden
| | - Young-Ki Paik
- Yonsei Proteome Research Center, Yonsei University, Room 425, Building #114, 50 Yonsei-ro, Seodaemoon-ku, Seoul 120-749, South Korea
| | - Jennifer E. Van Eyk
- Advanced Clinical BioSystems Research Institute, Cedars Sinai Precision Biomarker Laboratories, Barbra Streisand Women’s Heart Center, Cedars-Sinai Medical Center, Los Angeles, California 90048, United States
| | - Siqi Liu
- BGI Group-Shenzhen, Yantian District, Shenzhen 518083, China
| | - Stephen Pennington
- School of Medicine, University College Dublin, Conway Institute Belfield, Dublin 4, Ireland
| | - Michael P. Snyder
- Department of Genetics, Stanford University, Alway Building, 300 Pasteur Drive and 3165 Porter Drive, Palo Alto, California 94304, United States
| | - Mark S. Baker
- Department of Biomedical Sciences, Faculty of Medicine & Health Sciences, Macquarie University, 75 Talavera Road, North Ryde, NSW 2109, Australia
| | - Eric W. Deutsch
- Institute for Systems Biology, 401 Terry Avenue North, Seattle, Washington 98109-5263, United States
| |
Collapse
|
9
|
Omenn GS, Lane L, Overall CM, Corrales FJ, Schwenk JM, Paik YK, Van Eyk JE, Liu S, Snyder M, Baker MS, Deutsch EW. Progress on Identifying and Characterizing the Human Proteome: 2018 Metrics from the HUPO Human Proteome Project. J Proteome Res 2018; 17:4031-4041. [PMID: 30099871 PMCID: PMC6387656 DOI: 10.1021/acs.jproteome.8b00441] [Citation(s) in RCA: 50] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
The Human Proteome Project (HPP) annually reports on progress throughout the field in credibly identifying and characterizing the human protein parts list and making proteomics an integral part of multiomics studies in medicine and the life sciences. NeXtProt release 2018-01-17, the baseline for this sixth annual HPP special issue of the Journal of Proteome Research, contains 17 470 PE1 proteins, 89% of all neXtProt predicted PE1-4 proteins, up from 17 008 in release 2017-01-23 and 13 975 in release 2012-02-24. Conversely, the number of neXtProt PE2,3,4 missing proteins has been reduced from 2949 to 2579 to 2186 over the past two years. Of the PE1 proteins, 16 092 are based on mass spectrometry results, and 1378 on other kinds of protein studies, notably protein-protein interaction findings. PeptideAtlas has 15 798 canonical proteins, up 625 over the past year, including 269 from SUMOylation studies. The largest reason for missing proteins is low abundance. Meanwhile, the Human Protein Atlas has released its Cell Atlas, Pathology Atlas, and updated Tissue Atlas, and is applying recommendations from the International Working Group on Antibody Validation. Finally, there is progress using the quantitative multiplex organ-specific popular proteins targeted proteomics approach in various disease categories.
Collapse
Affiliation(s)
- Gilbert S. Omenn
- Department of Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw Avenue, Ann Arbor, Michigan 48109-2218, United States
- Institute for Systems Biology, 401 Terry Avenue North, Seattle, Washington 98109-5263, United States
| | - Lydie Lane
- CALIPHO Group, SIB Swiss Institute of Bioinformatics and Department of Microbiology and Molecular Medicine, Faculty of Medicine, University of Geneva, CMU, Michel-Servet 1, 1211 Geneva 4, Switzerland
| | - Christopher M. Overall
- Life Sciences Institute, Faculty of Dentistry, University of British Columbia, 2350 Health Sciences Mall, Room 4.401, Vancouver, BC Canada V6T 1Z3
| | | | - Jochen M. Schwenk
- Science for Life Laboratory, KTH Royal Institute of Technology, Tomtebodavägen 23A, 17165 Solna, Sweden
| | - Young-Ki Paik
- Yonsei Proteome Research Center, Room 425, Building #114, Yonsei University,50 Yonsei-ro, Seodaemoon-ku, Seoul 120-749, Korea
| | - Jennifer E. Van Eyk
- Advanced Clinical BioSystems Research Institute, Cedars Sinai Precision Biomarker Laboratories, Barbra Streisand Women’s Heart Center, Cedars-Sinai Medical Center, Los Angeles, CA 90048, United States
| | - Siqi Liu
- Department of Molecular Biology, University of Texas Southwestern Medical Center, Dallas, TX 75390-9148, United States
| | - Michael Snyder
- Department of Genetics, Stanford University, Alway Building, 300 Pasteur Drive, 3165 Porter Drive, Palo Alto, 94304, United States
| | - Mark S. Baker
- Department of Biomedical Sciences, Macquarie University, NSW 2109, Australia
| | - Eric W. Deutsch
- Institute for Systems Biology, 401 Terry Avenue North, Seattle, Washington 98109-5263, United States
| |
Collapse
|
10
|
Siddiqui O, Zhang H, Guan Y, Omenn GS. Chromosome 17 Missing Proteins: Recent Progress and Future Directions as Part of the neXt-MP50 Challenge. J Proteome Res 2018; 17:4061-4071. [PMID: 30280577 DOI: 10.1021/acs.jproteome.8b00442] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The Chromosome-centric Human Proteome Project (C-HPP), announced in September 2016, is an initiative to accelerate progress on the detection and characterization of neXtProt PE2,3,4 "missing proteins" (MPs) with a mandate to each chromosome team to find about 50 MPs over 2 years. Here we report major progress toward the neXt-MP50 challenge with 43 newly validated Chr 17 PE1 proteins, of which 25 were based on mass spectrometry, 12 on protein-protein interactions, 3 on a combination of MS and PPI, and 3 with other types of data. Notable among these new PE1 proteins were five keratin-associated proteins, a single olfactory receptor, and five additional membrane-embedded proteins. We evaluate the prospects of finding the remaining 105 MPs coded for on Chr 17, focusing on mass spectrometry and protein-protein interaction approaches. We present a list of 35 prioritized MPs with specific approaches that may be used in further MS and PPI experimental studies. Additionally, we demonstrate how in silico studies can be used to capture individual peptides from major data repositories, documenting one MP that appears to be a strong candidate for PE1. We are close to our goal of finding 50 MPs for Chr 17.
Collapse
Affiliation(s)
- Omer Siddiqui
- Department of Electronic Engineering and Computer Science , University of Michigan , Ann Arbor , Michigan 48109 , United States.,Department of Computational Medicine and Bioinformatics , University of Michigan , Ann Arbor , Michigan 48109 , United States
| | - Hongjiu Zhang
- Department of Computational Medicine and Bioinformatics , University of Michigan , Ann Arbor , Michigan 48109 , United States
| | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics , University of Michigan , Ann Arbor , Michigan 48109 , United States.,Department of Internal Medicine , University of Michigan , Ann Arbor , Michigan 48109 , United States
| | - Gilbert S Omenn
- Department of Computational Medicine and Bioinformatics , University of Michigan , Ann Arbor , Michigan 48109 , United States.,Department of Internal Medicine , University of Michigan , Ann Arbor , Michigan 48109 , United States.,Department of Human Genetics and School of Public Health , University of Michigan , Ann Arbor , Michigan 48109 , United States
| |
Collapse
|
11
|
Abstract
The Human Proteome Organization (HUPO) Human Proteome Project (HPP) continues to make progress on its two overall goals: (1) completing the protein parts list, with an annual update of the HUPO draft human proteome, and (2) making proteomics an integrated complement to genomics and transcriptomics throughout biomedical and life sciences research. neXtProt version 2017-01-23 has 17 008 confident protein identifications (Protein Existence [PE] level 1) that are compliant with the HPP Guidelines v2.1 ( https://hupo.org/Guidelines ), up from 13 664 in 2012-12 and 16 518 in 2016-04. Remaining to be found by mass spectrometry and other methods are 2579 "missing proteins" (PE2+3+4), down from 2949 in 2016. PeptideAtlas 2017-01 has 15 173 canonical proteins, accounting for nearly all of the 15 290 PE1 proteins based on MS data. These resources have extensive data on PTMs, single amino acid variants, and splice isoforms. The Human Protein Atlas v16 has 10 492 highly curated protein entries with tissue and subcellular spatial localization of proteins and transcript expression. Organ-specific popular protein lists have been generated for broad use in quantitative targeted proteomics using SRM-MS or DIA-SWATH-MS studies of biology and disease.
Collapse
Affiliation(s)
- Gilbert S Omenn
- Department of Computational Medicine and Bioinformatics, University of Michigan , 100 Washtenaw Avenue, Ann Arbor, Michigan 48109-2218, United States.,Institute for Systems Biology , 401 Terry Avenue North, Seattle, Washington 98109-5263, United States
| | - Lydie Lane
- CALIPHO Group, SIB Swiss Institute of Bioinformatics and Department of Human Protein Science, University of Geneva , CMU, Michel-Servet 1, 1211 Geneva 4, Switzerland
| | - Emma K Lundberg
- SciLifeLab Stockholm and School of Biotechnology, KTH, Karolinska Institutet Science Park , Tomtebodavägen 23, SE-171 65 Solna, Sweden
| | - Christopher M Overall
- Life Sciences Institute, Faculty of Dentistry, University of British Columbia , 2350 Health Sciences Mall, Room 4.401, Vancouver, British Columbia V6T 1Z3, Canada
| | - Eric W Deutsch
- Institute for Systems Biology , 401 Terry Avenue North, Seattle, Washington 98109-5263, United States
| |
Collapse
|
12
|
Abstract
Proteomics-based biological research is greatly expanded by high-quality mass spectrometry studies, which are themselves enabled by access to quality mass spectrometry resources, such as high-quality curated proteome data repositories. We present a PeptideAtlas for the domestic chicken, containing an extensive and robust collection of chicken tissue and plasma samples with substantial value for the chicken proteomics community for protein validation and design of downstream targeted proteome quantitation. The chicken PeptideAtlas contains 6646 canonical proteins at a protein FDR of 1.3%, derived from ∼100 000 peptides at a peptide level FDR of 0.1%. The rich collection of readily accessible data is easily mined for the purposes of data validation and experimental planning, particularly in the realm of developing proteome quantitation workflows. Herein we demonstrate the use of the atlas to mine information on common chicken acute-phase proteins and biomarkers for cancer detection research, as well as their localization and polymorphisms. This wealth of information will support future proteome-based research using this highly important agricultural organism in pursuit of both chicken and human health outcomes.
Collapse
Affiliation(s)
- James McCord
- W.M. Keck FTMS Laboratory for Human Health Research, Department of Chemistry, North Carolina State University , Raleigh, North Carolina 27695, United States
| | - Zhi Sun
- Institute for Systems Biology , Seattle, Washington 98109, United States
| | - Eric W Deutsch
- Institute for Systems Biology , Seattle, Washington 98109, United States
| | - Robert L Moritz
- Institute for Systems Biology , Seattle, Washington 98109, United States
| | - David C Muddiman
- W.M. Keck FTMS Laboratory for Human Health Research, Department of Chemistry, North Carolina State University , Raleigh, North Carolina 27695, United States
| |
Collapse
|
13
|
Omenn GS, Lane L, Lundberg EK, Beavis RC, Overall CM, Deutsch EW. Metrics for the Human Proteome Project 2016: Progress on Identifying and Characterizing the Human Proteome, Including Post-Translational Modifications. J Proteome Res 2016; 15:3951-3960. [PMID: 27487407 PMCID: PMC5129622 DOI: 10.1021/acs.jproteome.6b00511] [Citation(s) in RCA: 63] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
The HUPO Human Proteome Project (HPP) has two overall goals: (1) stepwise completion of the protein parts list-the draft human proteome including confidently identifying and characterizing at least one protein product from each protein-coding gene, with increasing emphasis on sequence variants, post-translational modifications (PTMs), and splice isoforms of those proteins; and (2) making proteomics an integrated counterpart to genomics throughout the biomedical and life sciences community. PeptideAtlas and GPMDB reanalyze all major human mass spectrometry data sets available through ProteomeXchange with standardized protocols and stringent quality filters; neXtProt curates and integrates mass spectrometry and other findings to present the most up to date authorative compendium of the human proteome. The HPP Guidelines for Mass Spectrometry Data Interpretation version 2.1 were applied to manuscripts submitted for this 2016 C-HPP-led special issue [ www.thehpp.org/guidelines ]. The Human Proteome presented as neXtProt version 2016-02 has 16,518 confident protein identifications (Protein Existence [PE] Level 1), up from 13,664 at 2012-12, 15,646 at 2013-09, and 16,491 at 2014-10. There are 485 proteins that would have been PE1 under the Guidelines v1.0 from 2012 but now have insufficient evidence due to the agreed-upon more stringent Guidelines v2.0 to reduce false positives. neXtProt and PeptideAtlas now both require two non-nested, uniquely mapping (proteotypic) peptides of at least 9 aa in length. There are 2,949 missing proteins (PE2+3+4) as the baseline for submissions for this fourth annual C-HPP special issue of Journal of Proteome Research. PeptideAtlas has 14,629 canonical (plus 1187 uncertain and 1755 redundant) entries. GPMDB has 16,190 EC4 entries, and the Human Protein Atlas has 10,475 entries with supportive evidence. neXtProt, PeptideAtlas, and GPMDB are rich resources of information about post-translational modifications (PTMs), single amino acid variants (SAAVSs), and splice isoforms. Meanwhile, the Biology- and Disease-driven (B/D)-HPP has created comprehensive SRM resources, generated popular protein lists to guide targeted proteomics assays for specific diseases, and launched an Early Career Researchers initiative.
Collapse
Affiliation(s)
- Gilbert S. Omenn
- Department of Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw Avenue, Ann Arbor, Michigan 48109-2218, United States
| | - Lydie Lane
- CALIPHO Group, SIB Swiss Institute of Bioinformatics and Department of Human Protein Science, University of Geneva, CMU, Michel-Servet 1, 1211 Geneva 4, Switzerland
| | - Emma K. Lundberg
- SciLifeLab Stockholm and School of Biotechnology, KTH, Karolinska Institutet Science Park, Tomtebodavägen 23, SE-171 65 Solna, Sweden
| | - Ronald C. Beavis
- Biochemistry & Medical Genetics, University of Manitoba, Winnipeg, MB R3T 2N2, Canada
| | - Christopher M. Overall
- Biochemistry and Molecular Biology, and Oral Biological and Medical Sciences University of British Columbia, 2350 Health Sciences Mall, Room 4.401, Vancouver, BC V6T 1Z3, Canada
| | - Eric W. Deutsch
- Institute for Systems Biology, 401 Terry Avenue North, Seattle, Washington 98109-5263, United States
| |
Collapse
|
14
|
Abstract
Quantification of individual proteins and even entire proteomes is an important theme in proteomics research. Quantitative proteomics is an approach to obtain quantitative information about proteins in a sample. Compared to qualitative or semi-quantitative proteomics, this approach can provide more insight into the effects of a specific stimulus, such as a change in the expression level of a protein and its posttranslational modifications, or to a panel of proposed biomarkers in a given disease state. Proteomics methodologies, along with a variety of bioinformatics approaches, are a major tool in quantitative proteomics. As the theory and technological aspects underlying the proteomics methodologies will be extensively described in Chap. 20 , and protein identification as a prerequisite of quantification has been discussed in Chap. 17 , we will focus on the quantitative proteomics bioinformatics algorithms and software tools in this chapter. Our goal is to provide researchers and newcomers a rational framework to select suitable bioinformatics tools for data analysis, interpretation, and integration in protein quantification. Before doing so, a brief overview of quantitative proteomics is provided.
Collapse
Affiliation(s)
- Yun Chen
- School of Pharmacy, Nanjing Medical University, 818 Tian Yuan East Road, Nanjing, 211166, China.
| | - Fuqiang Wang
- School of Pharmacy, Nanjing Medical University, 818 Tian Yuan East Road, Nanjing, 211166, China
| | - Feifei Xu
- School of Pharmacy, Nanjing Medical University, 818 Tian Yuan East Road, Nanjing, 211166, China
| | - Ting Yang
- School of Pharmacy, Nanjing Medical University, 818 Tian Yuan East Road, Nanjing, 211166, China
| |
Collapse
|
15
|
Vialas V, Sun Z, Reales-Calderón JA, Hernáez ML, Casas V, Carrascal M, Abián J, Monteoliva L, Deutsch EW, Moritz RL, Gil C. A comprehensive Candida albicans PeptideAtlas build enables deep proteome coverage. J Proteomics 2015; 131:122-130. [PMID: 26493587 DOI: 10.1016/j.jprot.2015.10.019] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2015] [Revised: 10/07/2015] [Accepted: 10/15/2015] [Indexed: 12/29/2022]
Abstract
To provide new and expanded proteome documentation of the opportunistically pathogen Candida albicans, we have developed new protein extraction and analysis routines to provide a new, extended and enhanced version of the C. albicans PeptideAtlas. Two new datasets, resulting from experiments consisting of exhaustive subcellular fractionations and different growing conditions, plus two additional datasets from previous experiments on the surface and the secreted proteomes, have been incorporated to increase the coverage of the proteome. High resolution precursor mass spectrometry (MS) and ion trap tandem MS spectra were analyzed with three different search engines using a database containing allele-specific sequences. This approach, novel for a large-scale C. albicans proteomics project, was combined with the post-processing and filtering implemented in the Trans Proteomic Pipeline consistently used in the PeptideAtlas project and resulted in 49,372 additional peptides (3-fold increase) and 1630 more proteins (1.6-fold increase) identified in the new C. albicans PeptideAtlas with respect to the previous build. A total of 71,310 peptides and 4174 canonical (minimal non-redundant set) proteins (4115 if one protein per pair of alleles is considered) were identified representing 66% of the 6218 proteins in the predicted proteome. This makes the new PeptideAtlas build the most comprehensive C. albicans proteomics resource available and the only large-scale one with detections of individual alleles.
Collapse
Affiliation(s)
- Vital Vialas
- Departamento de Microbiología II, Universidad Complutense Madrid (UCM), Spain; Instituto Ramón y Cajal de Investigación Sanitaria (IRYCIS), Madrid, Spain
| | - Zhi Sun
- Institute for Systems Biology, 401, Terry Ave North, Seattle, WA 98109, USA
| | - Jose A Reales-Calderón
- Departamento de Microbiología II, Universidad Complutense Madrid (UCM), Spain; Instituto Ramón y Cajal de Investigación Sanitaria (IRYCIS), Madrid, Spain
| | - María L Hernáez
- Unidad de Proteómica, Universidad Complutense de Madrid-Parque Científico de Madrid (UCM-PCM), Spain
| | - Vanessa Casas
- CSIC/UAB Proteomics Laboratory, IIBB-CSIC, IDIBAPS, Barcelona, Spain
| | | | - Joaquín Abián
- CSIC/UAB Proteomics Laboratory, IIBB-CSIC, IDIBAPS, Barcelona, Spain
| | - Lucía Monteoliva
- Departamento de Microbiología II, Universidad Complutense Madrid (UCM), Spain; Instituto Ramón y Cajal de Investigación Sanitaria (IRYCIS), Madrid, Spain
| | - Eric W Deutsch
- Institute for Systems Biology, 401, Terry Ave North, Seattle, WA 98109, USA
| | - Robert L Moritz
- Institute for Systems Biology, 401, Terry Ave North, Seattle, WA 98109, USA
| | - Concha Gil
- Departamento de Microbiología II, Universidad Complutense Madrid (UCM), Spain; Instituto Ramón y Cajal de Investigación Sanitaria (IRYCIS), Madrid, Spain; Corresponding author at: Departamento de Microbiología II, Universidad Complutense Madrid (UCM), Facultad de Farmacia, Plaza Ramón y Cajal s/n, 28040, Madrid, Spain.
| |
Collapse
|
16
|
Omenn GS, Lane L, Lundberg EK, Beavis RC, Nesvizhskii AI, Deutsch EW. Metrics for the Human Proteome Project 2015: Progress on the Human Proteome and Guidelines for High-Confidence Protein Identification. J Proteome Res 2015; 14:3452-60. [PMID: 26155816 DOI: 10.1021/acs.jproteome.5b00499] [Citation(s) in RCA: 78] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Remarkable progress continues on the annotation of the proteins identified in the Human Proteome and on finding credible proteomic evidence for the expression of "missing proteins". Missing proteins are those with no previous protein-level evidence or insufficient evidence to make a confident identification upon reanalysis in PeptideAtlas and curation in neXtProt. Enhanced with several major new data sets published in 2014, the human proteome presented as neXtProt, version 2014-09-19, has 16,491 unique confident proteins (PE level 1), up from 13,664 at 2012-12 and 15,646 at 2013-09. That leaves 2948 missing proteins from genes classified having protein existence level PE 2, 3, or 4, as well as 616 dubious proteins at PE 5. Here, we document the progress of the HPP and discuss the importance of assessing the quality of evidence, confirming automated findings and considering alternative protein matches for spectra and peptides. We provide guidelines for proteomics investigators to apply in reporting newly identified proteins.
Collapse
Affiliation(s)
- Gilbert S Omenn
- Department of Computational Medicine and Bioinformatics, University of Michigan , 100 Washtenaw Avenue, Ann Arbor, Michigan 48109-2218, United States
| | - Lydie Lane
- CALIPHO Group, Swiss Institute of Bioinformatics , Michel-Servet 1, 1211 Geneva 4, Switzerland
| | - Emma K Lundberg
- SciLifeLab Stockholm and School of Biotechnology, KTH , Karolinska Institutet Science Park, Tomtebodavägen 23, SE-171 65 Solna, Sweden
| | - Ronald C Beavis
- Biochemistry & Medical Genetics, University of Manitoba , Winnipeg, MB, Canada R3T 2N2
| | - Alexey I Nesvizhskii
- Pathology Department, University of Michigan , Medical Science Building 1, M4237, Ann Arbor, Michigan 48109-5602, United States
| | - Eric W Deutsch
- Institute for Systems Biology , 401 Terry Avenue North, Seattle, Washington 98109-5263, United States
| |
Collapse
|
17
|
Abstract
One goal of the Human Proteome Project is to identify at least one protein product for each of the ∼20,000 human protein-coding genes. As of October 2014, however, there are 3564 genes (18%) that have no or insufficient evidence of protein existence (PE), as curated by neXtProt; these comprise 2647 PE2-4 missing proteins and 616 PE5 dubious protein entries. We conducted a systematic examination of the 616 PE5 protein entries using cutting-edge protein structure and function modeling methods. Compared to a random sample of high-confidence PE1 proteins, the putative PE5 proteins were found to be over-represented in the membrane and cell surface proteins and peptides fold families. Detailed functional analyses show that most PE5 proteins, if expressed, would belong to transporters and receptors localized in the plasma membrane compartment. The results suggest that experimental difficulty in identifying membrane-bound proteins and peptides could have precluded their detection in mass spectrometry and that special enrichment techniques with improved sensitivity for membrane proteins could be important for the characterization of the PE5 "dark matter" of the human proteome. Finally, we identify 66 high scoring PE5 protein entries and find that six of them were reported in recent mass spectrometry databases; an illustrative annotation of these six is provided. This work illustrates a new approach to examine the potential folding and function of the dubious proteins comprising PE5, which we will next apply to the far larger group of missing proteins comprising PE2-4.
Collapse
Affiliation(s)
- Qiwen Dong
- School of Computer Science, Fudan University , Shanghai, 204433, China
| | | | | | | |
Collapse
|
18
|
Deutsch EW, Sun Z, Campbell D, Kusebauch U, Chu CS, Mendoza L, Shteynberg D, Omenn GS, Moritz RL. State of the Human Proteome in 2014/2015 As Viewed through PeptideAtlas: Enhancing Accuracy and Coverage through the AtlasProphet. J Proteome Res 2015; 14:3461-73. [PMID: 26139527 DOI: 10.1021/acs.jproteome.5b00500] [Citation(s) in RCA: 93] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The Human PeptideAtlas is a compendium of the highest quality peptide identifications from over 1000 shotgun mass spectrometry proteomics experiments collected from many different laboratories, all reanalyzed through a uniform processing pipeline. The latest 2015-03 build contains substantially more input data than past releases, is mapped to a recent version of our merged reference proteome, and uses improved informatics processing and the development of the AtlasProphet to provide the highest quality results. Within the set of ∼20,000 neXtProt primary entries, 14,070 (70%) are confidently detected in the latest build, 5% are ambiguous, 9% are redundant, leaving the total percentage of proteins for which there are no mapping detections at just 16% (3166), all derived from over 133 million peptide-spectrum matches identifying more than 1 million distinct peptides using AtlasProphet to characterize and classify the protein matches. Improved handling for detection and presentation of single amino-acid variants (SAAVs) reveals the detection of 5326 uniquely mapping SAAVs across 2794 proteins. With such a large amount of data, the control of false positives is a challenge. We present the methodology and results for maintaining rigorous quality along with a discussion of the implications of the remaining sources of errors in the build.
Collapse
Affiliation(s)
- Eric W Deutsch
- Institute for Systems Biology, 401 Terry Avenue North, Seattle, Washington 98109, United States
| | - Zhi Sun
- Institute for Systems Biology, 401 Terry Avenue North, Seattle, Washington 98109, United States
| | - David Campbell
- Institute for Systems Biology, 401 Terry Avenue North, Seattle, Washington 98109, United States
| | - Ulrike Kusebauch
- Institute for Systems Biology, 401 Terry Avenue North, Seattle, Washington 98109, United States
| | - Caroline S Chu
- Institute for Systems Biology, 401 Terry Avenue North, Seattle, Washington 98109, United States
| | - Luis Mendoza
- Institute for Systems Biology, 401 Terry Avenue North, Seattle, Washington 98109, United States
| | - David Shteynberg
- Institute for Systems Biology, 401 Terry Avenue North, Seattle, Washington 98109, United States
| | - Gilbert S Omenn
- Institute for Systems Biology, 401 Terry Avenue North, Seattle, Washington 98109, United States.,Departments of Computational Medicine & Bioinformatics, Internal Medicine, Human Genetics and School of Public Health, University of Michigan , Ann Arbor, Michigan 48109, United States
| | - Robert L Moritz
- Institute for Systems Biology, 401 Terry Avenue North, Seattle, Washington 98109, United States
| |
Collapse
|
19
|
Bundgaard L, Jacobsen S, Sørensen MA, Sun Z, Deutsch EW, Moritz RL, Bendixen E. The Equine PeptideAtlas: a resource for developing proteomics-based veterinary research. Proteomics 2014; 14:763-73. [PMID: 24436130 DOI: 10.1002/pmic.201300398] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2013] [Revised: 12/13/2013] [Accepted: 12/15/2013] [Indexed: 11/11/2022]
Abstract
Progress in MS-based methods for veterinary research and diagnostics is lagging behind compared to the human research, and proteome data of domestic animals is still not well represented in open source data repositories. This is particularly true for the equine species. Here we present a first Equine PeptideAtlas encompassing high-resolution tandem MS analyses of 51 samples representing a selection of equine tissues and body fluids from healthy and diseased animals. The raw data were processed through the Trans-Proteomic Pipeline to yield high quality identification of proteins and peptides. The current release comprises 24 131 distinct peptides representing 2636 canonical proteins observed at false discovery rates of 0.2% at the peptide level and 1.4% at the protein level. Data from the Equine PeptideAtlas are available for experimental planning, validation of new datasets, and as a proteomic data mining resource. The advantages of the Equine PeptideAtlas are demonstrated by examples of mining the contents for information on potential and well-known equine acute phase proteins, which have extensive general interest in the veterinary clinic. The extracted information will support further analyses, and emphasizes the value of the Equine PeptideAtlas as a resource for the design of targeted quantitative proteomic studies.
Collapse
Affiliation(s)
- Louise Bundgaard
- Department of Large Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark; Department of Molecular Biology and Genetics, Faculty of Science and Technology, Aarhus University, Aarhus, Denmark
| | | | | | | | | | | | | |
Collapse
|
20
|
Omenn GS. The strategy, organization, and progress of the HUPO Human Proteome Project. J Proteomics 2013; 100:3-7. [PMID: 24145142 DOI: 10.1016/j.jprot.2013.10.012] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2013] [Revised: 10/08/2013] [Accepted: 10/09/2013] [Indexed: 02/04/2023]
Abstract
UNLABELLED The Human Proteome Project is a major, comprehensive initiative of the Human Proteome Organization. This global collaborative effort aims to identify and characterize at least one protein product and many PTM, SAP, and splice variant isoforms from the 20,300 human protein-coding genes. The deliverables are an extensive parts list and an array of technology platforms, reagents, spectral libraries, and linked knowledge bases that advance the field and facilitate the use of proteomics by a much wider community of life scientists. Such enablement will help address the Grand Challenge of using proteomics to bridge major gaps between evidence of genomic variation and diverse phenotypes. BIOLOGICAL SIGNIFICANCE The HUPO Human Proteome Project (HPP) has made an outstanding launch, including a special issue of the Journal of Proteome Research on the Chromosome-centric HPP with a total of 48 articles. This article is part of a Special Issue: Can Proteomics Fill the Gap Between Genomics and Phenotypes?
Collapse
Affiliation(s)
- Gilbert S Omenn
- Center for Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, MI 48109-2218, USA.
| |
Collapse
|
21
|
Vialas V, Sun Z, Loureiro y Penha CV, Carrascal M, Abián J, Monteoliva L, Deutsch EW, Aebersold R, Moritz RL, Gil C. A Candida albicans PeptideAtlas. J Proteomics 2013; 97:62-8. [PMID: 23811049 DOI: 10.1016/j.jprot.2013.06.020] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2013] [Accepted: 06/16/2013] [Indexed: 01/08/2023]
Abstract
UNLABELLED Candida albicans public proteomic datasets, though growing steadily in the last few years, still have a very limited presence in online repositories. We report here the creation of a C. albicans PeptideAtlas comprising near 22,000 distinct peptides at a 0.24% False Discovery Rate (FDR) that account for over 2500 canonical proteins at a 1.2% FDR. Based on data from 16 experiments, we attained coverage of 41% of the C. albicans open reading frame sequences (ORFs) in the database used for the searches. This PeptideAtlas provides several useful features, including comprehensive protein and peptide-centered search capabilities and visualization tools that establish a solid basis for the study of basic biological mechanisms key to virulence and pathogenesis such as dimorphism, adherence, and apoptosis. Further, it is a valuable resource for the selection of candidate proteotypic peptides for targeted proteomic experiments via Selected Reaction Monitoring (SRM) or SWATH-MS. BIOLOGICAL SIGNIFICANCE This C. albicans PeptideAtlas resolves the previous absence of fungal pathogens in the PeptideAtlas project. It represents the most extensive characterization of the proteome of this fungus that exists up to the current date, including evidence for uncharacterized ORFs. Through its web interface, PeptideAtlas supports the study of interesting proteins related to basic biological mechanisms key to virulence such as apoptosis, dimorphism and adherence. It also provides a valuable resource to select candidate proteotypic peptides for future (SRM) targeted proteomic experiments. This article is part of a Special Issue entitled: Trends in Microbial Proteomics.
Collapse
Affiliation(s)
- Vital Vialas
- Dept. Microbiología II, Universidad Complutense de Madrid, Madrid, Spain; IRYCIS: Instituto Ramón y Cajal de Investigación Sanitaria, Madrid, Spain.
| | - Zhi Sun
- Institute for Systems Biology, Seattle, WA, USA
| | - Carla Verónica Loureiro y Penha
- Dept. Microbiología II, Universidad Complutense de Madrid, Madrid, Spain; IRYCIS: Instituto Ramón y Cajal de Investigación Sanitaria, Madrid, Spain
| | - Montserrat Carrascal
- CSIC/UAB Proteomics Laboratory, Instituto de Investigaciones Biomédicas de Barcelona-Consejo Superior de Investigaciones Científicas, Spain
| | - Joaquín Abián
- CSIC/UAB Proteomics Laboratory, Instituto de Investigaciones Biomédicas de Barcelona-Consejo Superior de Investigaciones Científicas, Spain
| | - Lucía Monteoliva
- Dept. Microbiología II, Universidad Complutense de Madrid, Madrid, Spain; IRYCIS: Instituto Ramón y Cajal de Investigación Sanitaria, Madrid, Spain
| | | | - Ruedi Aebersold
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland; Faculty of Science, University of Zurich, Zurich, Switzerland
| | | | - Concha Gil
- Dept. Microbiología II, Universidad Complutense de Madrid, Madrid, Spain; IRYCIS: Instituto Ramón y Cajal de Investigación Sanitaria, Madrid, Spain.
| |
Collapse
|