1
|
van Wijk KJ, Leppert T, Sun Z, Kearly A, Li M, Mendoza L, Guzchenko I, Debley E, Sauermann G, Routray P, Malhotra S, Nelson A, Sun Q, Deutsch EW. Detection of the Arabidopsis Proteome and Its Post-translational Modifications and the Nature of the Unobserved (Dark) Proteome in PeptideAtlas. J Proteome Res 2024; 23:185-214. [PMID: 38104260 DOI: 10.1021/acs.jproteome.3c00536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]
Abstract
This study describes a new release of the Arabidopsis thaliana PeptideAtlas proteomics resource (build 2023-10) providing protein sequence coverage, matched mass spectrometry (MS) spectra, selected post-translational modifications (PTMs), and metadata. 70 million MS/MS spectra were matched to the Araport11 annotation, identifying ∼0.6 million unique peptides and 18,267 proteins at the highest confidence level and 3396 lower confidence proteins, together representing 78.6% of the predicted proteome. Additional identified proteins not predicted in Araport11 should be considered for the next Arabidopsis genome annotation. This release identified 5198 phosphorylated proteins, 668 ubiquitinated proteins, 3050 N-terminally acetylated proteins, and 864 lysine-acetylated proteins and mapped their PTM sites. MS support was lacking for 21.4% (5896 proteins) of the predicted Araport11 proteome: the "dark" proteome. This dark proteome is highly enriched for E3 ligases, transcription factors, and for certain (e.g., CLE, IDA, PSY) but not other (e.g., THIONIN, CAP) signaling peptides families. A machine learning model trained on RNA expression data and protein properties predicts the probability that proteins will be detected. The model aids in discovery of proteins with short half-life (e.g., SIG1,3 and ERF-VII TFs) and for developing strategies to identify the missing proteins. PeptideAtlas is linked to TAIR, tracks in JBrowse, and several other community proteomics resources.
Collapse
Affiliation(s)
- Klaas J van Wijk
- Section of Plant Biology, School of Integrative Plant Sciences (SIPS), Cornell University, Ithaca, New York 14853, United States
| | - Tami Leppert
- Institute for Systems Biology (ISB), Seattle, Washington 98109, United States
| | - Zhi Sun
- Institute for Systems Biology (ISB), Seattle, Washington 98109, United States
| | - Alyssa Kearly
- Boyce Thompson Institute, Ithaca, New York 14853, United States
| | - Margaret Li
- Institute for Systems Biology (ISB), Seattle, Washington 98109, United States
| | - Luis Mendoza
- Institute for Systems Biology (ISB), Seattle, Washington 98109, United States
| | - Isabell Guzchenko
- Section of Plant Biology, School of Integrative Plant Sciences (SIPS), Cornell University, Ithaca, New York 14853, United States
| | - Erica Debley
- Section of Plant Biology, School of Integrative Plant Sciences (SIPS), Cornell University, Ithaca, New York 14853, United States
| | - Georgia Sauermann
- Section of Plant Biology, School of Integrative Plant Sciences (SIPS), Cornell University, Ithaca, New York 14853, United States
| | - Pratyush Routray
- Section of Plant Biology, School of Integrative Plant Sciences (SIPS), Cornell University, Ithaca, New York 14853, United States
| | - Sagunya Malhotra
- Institute for Systems Biology (ISB), Seattle, Washington 98109, United States
| | - Andrew Nelson
- Boyce Thompson Institute, Ithaca, New York 14853, United States
| | - Qi Sun
- Computational Biology Service Unit, Cornell University, Ithaca, New York 14853, United States
| | - Eric W Deutsch
- Institute for Systems Biology (ISB), Seattle, Washington 98109, United States
| |
Collapse
|
2
|
Reddy PJ, Sun Z, Wippel HH, Baxter D, Swearingen K, Shteynberg DD, Midha MK, Caimano MJ, Strle K, Choi Y, Chan AP, Schork NJ, Moritz RL. Borrelia PeptideAtlas: A proteome resource of common Borrelia burgdorferi isolates for Lyme research. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.16.545244. [PMID: 37398146 PMCID: PMC10312716 DOI: 10.1101/2023.06.16.545244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
Lyme disease, caused by an infection with the spirochete Borrelia burgdorferi, is the most common vector-borne disease in North America. B. burgdorferi strains harbor extensive genomic and proteomic variability and further comparison is key to understanding the spirochetes infectivity and biological impacts of identified sequence variants. To achieve this goal, both transcript and mass spectrometry (MS)-based proteomics was applied to assemble peptide datasets of laboratory strains B31, MM1, B31-ML23, infective isolates B31-5A4, B31-A3, and 297, and other public datasets, to provide a publicly available Borrelia PeptideAtlas http://www.peptideatlas.org/builds/borrelia/. Included is information on total proteome, secretome, and membrane proteome of these B. burgdorferi strains. Proteomic data collected from 35 different experiment datasets, with a total of 855 mass spectrometry runs, identified 76,936 distinct peptides at a 0.1% peptide false-discovery-rate, which map to 1,221 canonical proteins (924 core canonical and 297 noncore canonical) and covers 86% of the total base B31 proteome. The diverse proteomic information from multiple isolates with credible data presented by the Borrelia PeptideAtlas can be useful to pinpoint potential protein targets which are common to infective isolates and may be key in the infection process.
Collapse
Affiliation(s)
| | - Zhi Sun
- Institute for Systems Biology, Seattle, Washington, USA
| | | | - David Baxter
- Institute for Systems Biology, Seattle, Washington, USA
| | | | | | | | | | - Klemen Strle
- Department of Molecular Biology and Microbiology, Tufts University School of Medicine, Boston, Massachusetts, USA
| | - Yongwook Choi
- Translational Genomics Research Institute, Phoenix, Arizona, USA
| | - Agnes P. Chan
- Translational Genomics Research Institute, Phoenix, Arizona, USA
| | | | | |
Collapse
|
3
|
van Wijk KJ, Leppert T, Sun Z, Kearly A, Li M, Mendoza L, Guzchenko I, Debley E, Sauermann G, Routray P, Malhotra S, Nelson A, Sun Q, Deutsch EW. Mapping the Arabidopsis thaliana proteome in PeptideAtlas and the nature of the unobserved (dark) proteome; strategies towards a complete proteome. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.01.543322. [PMID: 37333403 PMCID: PMC10274743 DOI: 10.1101/2023.06.01.543322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/20/2023]
Abstract
This study describes a new release of the Arabidopsis thaliana PeptideAtlas proteomics resource providing protein sequence coverage, matched mass spectrometry (MS) spectra, selected PTMs, and metadata. 70 million MS/MS spectra were matched to the Araport11 annotation, identifying ∼0.6 million unique peptides and 18267 proteins at the highest confidence level and 3396 lower confidence proteins, together representing 78.6% of the predicted proteome. Additional identified proteins not predicted in Araport11 should be considered for building the next Arabidopsis genome annotation. This release identified 5198 phosphorylated proteins, 668 ubiquitinated proteins, 3050 N-terminally acetylated proteins and 864 lysine-acetylated proteins and mapped their PTM sites. MS support was lacking for 21.4% (5896 proteins) of the predicted Araport11 proteome - the 'dark' proteome. This dark proteome is highly enriched for certain ( e.g. CLE, CEP, IDA, PSY) but not other ( e.g. THIONIN, CAP,) signaling peptides families, E3 ligases, TFs, and other proteins with unfavorable physicochemical properties. A machine learning model trained on RNA expression data and protein properties predicts the probability for proteins to be detected. The model aids in discovery of proteins with short-half life ( e.g. SIG1,3 and ERF-VII TFs) and completing the proteome. PeptideAtlas is linked to TAIR, JBrowse, PPDB, SUBA, UniProtKB and Plant PTM Viewer.
Collapse
|
4
|
van Wijk KJ, Leppert T, Sun Q, Boguraev SS, Sun Z, Mendoza L, Deutsch EW. The Arabidopsis PeptideAtlas: Harnessing worldwide proteomics data to create a comprehensive community proteomics resource. THE PLANT CELL 2021; 33:3421-3453. [PMID: 34411258 PMCID: PMC8566204 DOI: 10.1093/plcell/koab211] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Accepted: 08/13/2021] [Indexed: 05/02/2023]
Abstract
We developed a resource, the Arabidopsis PeptideAtlas (www.peptideatlas.org/builds/arabidopsis/), to solve central questions about the Arabidopsis thaliana proteome, such as the significance of protein splice forms and post-translational modifications (PTMs), or simply to obtain reliable information about specific proteins. PeptideAtlas is based on published mass spectrometry (MS) data collected through ProteomeXchange and reanalyzed through a uniform processing and metadata annotation pipeline. All matched MS-derived peptide data are linked to spectral, technical, and biological metadata. Nearly 40 million out of ∼143 million MS/MS (tandem MS) spectra were matched to the reference genome Araport11, identifying ∼0.5 million unique peptides and 17,858 uniquely identified proteins (only isoform per gene) at the highest confidence level (false discovery rate 0.0004; 2 non-nested peptides ≥9 amino acid each), assigned canonical proteins, and 3,543 lower-confidence proteins. Physicochemical protein properties were evaluated for targeted identification of unobserved proteins. Additional proteins and isoforms currently not in Araport11 were identified that were generated from pseudogenes, alternative start, stops, and/or splice variants, and small Open Reading Frames; these features should be considered when updating the Arabidopsis genome. Phosphorylation can be inspected through a sophisticated PTM viewer. PeptideAtlas is integrated with community resources including TAIR, tracks in JBrowse, PPDB, and UniProtKB. Subsequent PeptideAtlas builds will incorporate millions more MS/MS data.
Collapse
Affiliation(s)
- Klaas J van Wijk
- Section of Plant Biology, School of Integrative Plant Sciences (SIPS), Cornell University, Ithaca, New York 14853, USA
- Authors for correspondence: (K.J.V.W.), (E.W.D.)
| | - Tami Leppert
- Institute for Systems Biology (ISB), Seattle, Washington 98109, USA
| | - Qi Sun
- Computational Biology Service Unit, Cornell University, Ithaca, New York 14853, USA
| | - Sascha S Boguraev
- Section of Plant Biology, School of Integrative Plant Sciences (SIPS), Cornell University, Ithaca, New York 14853, USA
| | - Zhi Sun
- Institute for Systems Biology (ISB), Seattle, Washington 98109, USA
| | - Luis Mendoza
- Institute for Systems Biology (ISB), Seattle, Washington 98109, USA
| | - Eric W Deutsch
- Institute for Systems Biology (ISB), Seattle, Washington 98109, USA
- Authors for correspondence: (K.J.V.W.), (E.W.D.)
| |
Collapse
|
5
|
Tholey A, Taylor NL, Heazlewood JL, Bendixen E. We Are Not Alone: The iMOP Initiative and Its Roles in a Biology- and Disease-Driven Human Proteome Project. J Proteome Res 2017; 16:4273-4280. [DOI: 10.1021/acs.jproteome.7b00408] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Affiliation(s)
- Andreas Tholey
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, 24105 Kiel, Germany
| | - Nicolas L. Taylor
- Australian
Research Council Centre of Excellence in Plant Energy Biology, School
of Molecular Sciences and Institute of Agriculture, The University of Western Australia, Crawley, Western Australia 6009, Australia
| | - Joshua L. Heazlewood
- School
of BioSciences, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Emøke Bendixen
- Department
of Molecular Biology and Genetics, Faculty of Science and Technology, Aarhus University, 8000 Aarhus, Denmark
| |
Collapse
|