1
|
Hwang H, Park GW, Park JY, Lee HK, Lee JY, Jeong JE, Park SKR, Yates JR, Kwon KH, Park YM, Lee HJ, Paik YK, Kim JY, Yoo JS. Next Generation Proteomic Pipeline for Chromosome-Based Proteomic Research Using NeXtProt and GENCODE Databases. J Proteome Res 2017; 16:4425-4434. [PMID: 28965411 DOI: 10.1021/acs.jproteome.7b00223] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Human Proteome Project aims to map all human proteins including missing proteins as well as proteoforms with post translational modifications, alternative splicing variants (ASVs), and single amino acid variants (SAAVs). neXtProt and Ensemble databases are usually used to provide curated information on human coding genes. However, to find these proteoforms, we (Chr #11 team) first introduce a streamlined pipeline using customized and concatenated neXtProt and GENCODE originated from Ensemble, with controlled false discovery rate (FDR). Because of large sized databases used in this pipeline, we found more stringent FDR filtering (0.1% at the peptide level and 1% at the protein level) to claim novel findings, such as GENCODE ASVs and missing proteins, from human hippocampus data set (MSV000081385) and ProteomeXchange (PXD007166). Using our next generation proteomic pipeline (nextPP) with neXtProt and GENCODE databases, two missing proteins such as activity-regulated cytoskeleton-associated protein (ARC, Chr 8) and glutamate receptor ionotropic, kainite 5 (GRIK5, Chr 19) were additionally identified with two or more unique peptides from human brain tissues. Additionally, by applying the pipeline to human brain related data sets such as cortex (PXD000067 and PXD000561), spinal cord, and fetal brain (PXD000561), seven GENCODE ASVs such as ACTN4-012 (Chr.19), DPYSL2-005 (Chr.8), MPRIP-003 (Chr.17), NCAM1-013 (Chr.11), EPB41L1-017 (Chr.20), AGAP1-004 (Chr.2), and CPNE5-005 (Chr.6) were identified from two or more data sets. The identified peptides of GENCODE ASVs were mapped onto novel exon insertions, alternative translations at 5'-untranslated region, or novel protein coding sequence. Applying the pipeline to male reproductive organ related data sets, 52 GENCODE ASVs were identified from two testis (PXD000561 and PXD002179) and a spermatozoa (PXD003947) data sets. Four out of 52 GENCODE ASVs such as RAB11FIP5-008 (Chr. 2), RP13-347D8.7-001 (Chr. X), PRDX4-002 (Chr. X), and RP11-666A8.13-001 (Chr. 17) were identified in all of the three samples.
Collapse
Affiliation(s)
- Heeyoun Hwang
- Biomedical Omics Group, Korea Basic Science Institute , Cheongju 28119, Republic of Korea
| | - Gun Wook Park
- Biomedical Omics Group, Korea Basic Science Institute , Cheongju 28119, Republic of Korea
| | - Ji Yeong Park
- Biomedical Omics Group, Korea Basic Science Institute , Cheongju 28119, Republic of Korea.,Graduate School of Analytical Science and Technology, Chungnam National University , Daejeon, Republic of Korea
| | - Hyun Kyoung Lee
- Biomedical Omics Group, Korea Basic Science Institute , Cheongju 28119, Republic of Korea.,Graduate School of Analytical Science and Technology, Chungnam National University , Daejeon, Republic of Korea
| | - Ju Yeon Lee
- Biomedical Omics Group, Korea Basic Science Institute , Cheongju 28119, Republic of Korea
| | - Ji Eun Jeong
- Biomedical Omics Group, Korea Basic Science Institute , Cheongju 28119, Republic of Korea.,Graduate School of Analytical Science and Technology, Chungnam National University , Daejeon, Republic of Korea
| | - Sung-Kyu Robin Park
- Department of Chemical Physiology, The Scripps Research Institute , La Jolla, California 92037, United States
| | - John R Yates
- Department of Chemical Physiology, The Scripps Research Institute , La Jolla, California 92037, United States
| | - Kyung-Hoon Kwon
- Biomedical Omics Group, Korea Basic Science Institute , Cheongju 28119, Republic of Korea
| | - Young Mok Park
- Center for Cognition and Sociality, Institute for Basic Science , Daejeon, Republic of Korea
| | - Hyoung-Joo Lee
- Yonsei Proteome Research Center and Department of Integrated OMICS for Biomedical Science, and Department of Biochemistry, College of Life Science and Biotechnology, Yonsei University , Seoul, Republic of Korea
| | - Young-Ki Paik
- Yonsei Proteome Research Center and Department of Integrated OMICS for Biomedical Science, and Department of Biochemistry, College of Life Science and Biotechnology, Yonsei University , Seoul, Republic of Korea
| | - Jin Young Kim
- Biomedical Omics Group, Korea Basic Science Institute , Cheongju 28119, Republic of Korea
| | - Jong Shin Yoo
- Biomedical Omics Group, Korea Basic Science Institute , Cheongju 28119, Republic of Korea.,Graduate School of Analytical Science and Technology, Chungnam National University , Daejeon, Republic of Korea
| |
Collapse
|
2
|
Abstract
Glycoproteins influence numerous indispensable biological functions, and changes in protein glycosylation have been observed in various diseases. The identification and characterization of glycoprotein and glycosylation sites by mass spectrometry (MS) remain challenging tasks, and great efforts have been devoted to the development of proteome informatics tools that facilitate the MS analysis of glycans and glycopeptides. Here we report on the development of gFinder, a web-based bioinformatics tool that analyzes mixtures of native N-glycopeptides that have been profiled by tandem MS. gFinder not only enables the simultaneous integration of collision-induced dissociation (CID) and high-energy collisional dissociation (HCD) fragmentation but also merges the spectra for high-throughput analysis. These merged spectra expedite the identification of both glycans and N-glycopeptide backbones in tandem MS data using the glycan database and a proteomic search tool (e.g., Mascot). These data can be used to simultaneously characterize peptide backbone sequences and possible N-glycan structures using assigned scores. gFinder also provides many convenient functions that make it easy to perform manual calculations while viewing the spectrum on-screen. We used gFinder to detect an additional protein (Q8N9B8) that was missed from the previously published data set containing N-linked glycosylation. For N-glycan analysis, we used the GlycomeDB glycan structure database, which integrates the structural and taxonomic data from all of the major carbohydrate databases available in the public domain. Thus, gFinder is a convenient, high-throughput analytical tool for interpreting the tandem mass spectra of N-glycopeptides, which can then be used for identification of potential missing proteins having glycans. gFinder is available publicly at http://gFinder.proteomix.org/ .
Collapse
Affiliation(s)
- Ju-Wan Kim
- Graduate Program in Functional Genomics, College of Life Sciences and Biotechnology, Yonsei University , Seoul 03722, Korea.,Yonsei Proteome Research Center , Seoul 03722, Korea
| | - Heeyoun Hwang
- Korea Basic Science Institute , Ochang 28199, Chungbuk, Korea
| | - Jong-Sun Lim
- Yonsei Proteome Research Center , Seoul 03722, Korea
| | | | - Seul-Ki Jeong
- Yonsei Proteome Research Center , Seoul 03722, Korea
| | - Jong Shin Yoo
- Korea Basic Science Institute , Ochang 28199, Chungbuk, Korea
| | - Young-Ki Paik
- Graduate Program in Functional Genomics, College of Life Sciences and Biotechnology, Yonsei University , Seoul 03722, Korea.,Yonsei Proteome Research Center , Seoul 03722, Korea
| |
Collapse
|