151
|
A workflow to identify novel proteins based on the direct mapping of peptide-spectrum-matches to genomic locations. BMC Bioinformatics 2021; 22:277. [PMID: 34039272 PMCID: PMC8157683 DOI: 10.1186/s12859-021-04159-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Accepted: 04/27/2021] [Indexed: 02/06/2023] Open
Abstract
Background Small Proteins have received increasing attention in recent years. They have in particular been implicated as signals contributing to the coordination of bacterial communities. In genome annotations they are often missing or hidden among large numbers of hypothetical proteins because genome annotation pipelines often exclude short open reading frames or over-predict hypothetical proteins based on simple models. The validation of novel proteins, and in particular of small proteins (sProteins), therefore requires additional evidence. Proteogenomics is considered the gold standard for this purpose. It extends beyond established annotations and includes all possible open reading frames (ORFs) as potential sources of peptides, thus allowing the discovery of novel, unannotated proteins. Typically this results in large numbers of putative novel small proteins fraught with large fractions of false-positive predictions. Results We observe that number and quality of the peptide-spectrum matches (PSMs) that map to a candidate ORF can be highly informative for the purpose of distinguishing proteins from spurious ORF annotations. We report here on a workflow that aggregates PSM quality information and local context into simple descriptors and reliably separates likely proteins from the large pool of false-positive, i.e., most likely untranslated ORFs. We investigated the artificial gut microbiome model SIHUMIx, comprising eight different species, for which we validate 5114 proteins that have previously been annotated only as hypothetical ORFs. In addition, we identified 37 non-annotated protein candidates for which we found evidence at the proteomic and transcriptomic level. Half (19) of these candidates have close functional homologs in other species. Another 12 candidates have homologs designated as hypothetical proteins in other species. The remaining six candidates are short (< 100 AA) and are most likely bona fide novel proteins. Conclusions The aggregation of PSM quality information for predicted ORFs provides a robust and efficient method to identify novel proteins in proteomics data. The workflow is in particular capable of identifying small proteins and frameshift variants. Since PSMs are explicitly mapped to genomic locations, it furthermore facilitates the integration of transcriptomics data and other sources of genome-level information. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04159-8.
Collapse
|
152
|
Halperin RF, Hegde A, Lang JD, Raupach EA, Legendre C, Liang WS, LoRusso PM, Sekulic A, Sosman JA, Trent JM, Rangasamy S, Pirrotte P, Schork NJ. Improved methods for RNAseq-based alternative splicing analysis. Sci Rep 2021; 11:10740. [PMID: 34031440 PMCID: PMC8144374 DOI: 10.1038/s41598-021-89938-2] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Accepted: 04/13/2021] [Indexed: 01/04/2023] Open
Abstract
The robust detection of disease-associated splice events from RNAseq data is challenging due to the potential confounding effect of gene expression levels and the often limited number of patients with relevant RNAseq data. Here we present a novel statistical approach to splicing outlier detection and differential splicing analysis. Our approach tests for differences in the percentages of sequence reads representing local splice events. We describe a software package called Bisbee which can predict the protein-level effect of splice alterations, a key feature lacking in many other splicing analysis resources. We leverage Bisbee's prediction of protein level effects as a benchmark of its capabilities using matched sets of RNAseq and mass spectrometry data from normal tissues. Bisbee exhibits improved sensitivity and specificity over existing approaches and can be used to identify tissue-specific splice variants whose protein-level expression can be confirmed by mass spectrometry. We also applied Bisbee to assess evidence for a pathogenic splicing variant contributing to a rare disease and to identify tumor-specific splice isoforms associated with an oncogenic mutation. Bisbee was able to rediscover previously validated results in both of these cases and also identify common tumor-associated splice isoforms replicated in two independent melanoma datasets.
Collapse
Affiliation(s)
- Rebecca F Halperin
- Quantitative Medicine and Systems Biology Division, Translational Genomics Research Institute, Phoenix, AZ, USA.
| | - Apurva Hegde
- Collaborative Center for Translational Mass Spectrometry, Translational Genomics Research Institute, Phoenix, AZ, USA
| | - Jessica D Lang
- Integrated Cancer Genomics Division, Translational Genomics Research Institute, Phoenix, AZ, USA
| | - Elizabeth A Raupach
- Integrated Cancer Genomics Division, Translational Genomics Research Institute, Phoenix, AZ, USA
| | - Christophe Legendre
- Integrated Cancer Genomics Division, Translational Genomics Research Institute, Phoenix, AZ, USA
| | - Winnie S Liang
- Integrated Cancer Genomics Division, Translational Genomics Research Institute, Phoenix, AZ, USA
- Neurogenomics Division, Translational Genomics Research Institute, Phoenix, AZ, USA
| | | | | | | | - Jeffrey M Trent
- Integrated Cancer Genomics Division, Translational Genomics Research Institute, Phoenix, AZ, USA
| | | | - Patrick Pirrotte
- Collaborative Center for Translational Mass Spectrometry, Translational Genomics Research Institute, Phoenix, AZ, USA
| | - Nicholas J Schork
- Quantitative Medicine and Systems Biology Division, Translational Genomics Research Institute, Phoenix, AZ, USA
| |
Collapse
|
153
|
Su M, Zhang Z, Zhou L, Han C, Huang C, Nice EC. Proteomics, Personalized Medicine and Cancer. Cancers (Basel) 2021; 13:2512. [PMID: 34063807 PMCID: PMC8196570 DOI: 10.3390/cancers13112512] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2021] [Revised: 05/12/2021] [Accepted: 05/17/2021] [Indexed: 02/05/2023] Open
Abstract
As of 2020 the human genome and proteome are both at >90% completion based on high stringency analyses. This has been largely achieved by major technological advances over the last 20 years and has enlarged our understanding of human health and disease, including cancer, and is supporting the current trend towards personalized/precision medicine. This is due to improved screening, novel therapeutic approaches and an increased understanding of underlying cancer biology. However, cancer is a complex, heterogeneous disease modulated by genetic, molecular, cellular, tissue, population, environmental and socioeconomic factors, which evolve with time. In spite of recent advances in treatment that have resulted in improved patient outcomes, prognosis is still poor for many patients with certain cancers (e.g., mesothelioma, pancreatic and brain cancer) with a high death rate associated with late diagnosis. In this review we overview key hallmarks of cancer (e.g., autophagy, the role of redox signaling), current unmet clinical needs, the requirement for sensitive and specific biomarkers for early detection, surveillance, prognosis and drug monitoring, the role of the microbiome and the goals of personalized/precision medicine, discussing how emerging omics technologies can further inform on these areas. Exemplars from recent onco-proteogenomic-related publications will be given. Finally, we will address future perspectives, not only from the standpoint of perceived advances in treatment, but also from the hurdles that have to be overcome.
Collapse
Affiliation(s)
- Miao Su
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital, and West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, and Collaborative Innovation Center for Biotherapy, Chengdu 610041, China; (M.S.); (Z.Z.); (L.Z.); (C.H.)
| | - Zhe Zhang
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital, and West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, and Collaborative Innovation Center for Biotherapy, Chengdu 610041, China; (M.S.); (Z.Z.); (L.Z.); (C.H.)
| | - Li Zhou
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital, and West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, and Collaborative Innovation Center for Biotherapy, Chengdu 610041, China; (M.S.); (Z.Z.); (L.Z.); (C.H.)
| | - Chao Han
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital, and West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, and Collaborative Innovation Center for Biotherapy, Chengdu 610041, China; (M.S.); (Z.Z.); (L.Z.); (C.H.)
| | - Canhua Huang
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital, and West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, and Collaborative Innovation Center for Biotherapy, Chengdu 610041, China; (M.S.); (Z.Z.); (L.Z.); (C.H.)
| | - Edouard C. Nice
- Department of Biochemistry and Molecular Biology, Monash University, Clayton, VIC 3800, Australia
| |
Collapse
|
154
|
Tian F, Shi J, Li Y, Gao H, Chang L, Zhang Y, Gao L, Xu P, Tang S. Proteogenomics Study of Blastobotrys adeninivorans TMCC 70007-A Dominant Yeast in the Fermentation Process of Pu-erh Tea. J Proteome Res 2021; 20:3290-3304. [PMID: 34008989 DOI: 10.1021/acs.jproteome.1c00205] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Blastobotrys adeninivorans plays an essential role in pile-fermenting of Pu-erh tea. Its ability to assimilate various carbon and nitrogen sources makes it available for application in a wide range of industry sectors. The genome of B. adeninivorans TMCC 70007 isolated from pile-fermented Pu-erh tea was sequenced and assembled. Proteomics analysis indicated that 4900 proteins in TMCC 70007 were expressed under various culture conditions. Proteogenomics mapping revealed 48 previously unknown genes and corrected 118 gene models predicted by GeneMark-ES. Ortho-proteogenomics analysis identified 17 previously unidentified genes in B. adeninivorans LS3, the first strain with a sequenced genome among the genus Blastobotrys as well. More importantly, five species specific genes were identified from TMCC 70007, which could serve as a barcode for strain typing and were applicable for fermentation process protection of this industrial species. The datasets generated from tea aqueous extract culture not only increased the proteome coverage and accuracy but also contributed to the identification of proteins related to polyphenols and caffeine, which were considered to change greatly during the microbial fermentation of Pu-erh tea. This study provides a proteome perspective on TMCC 70007, which was considered to be an important strain in the production of Pu-erh tea. The systematic proteogenomics analysis not only made a better annotation on the genome of B. adeninivorans TMCC 70007 as previous proteogenomics study but also provided solution for fermentation process protection on valuable industrial species with species specific genes uniquely identified from proteogenomics study.
Collapse
Affiliation(s)
- Fei Tian
- Key Laboratory of Microbial Diversity in Southwest China, Ministry of Education, and Laboratory for Conservation and Utilization of Bio-resources, Yunnan Institute of Microbiology, School of Life Sciences, Yunnan University, Kunming 650091, China.,State Key Laboratory of Proteomics, Beijing Proteome Research Center, Research Unit of Proteomics & Research and Development of New Drug, Chinese Academy of Medical Sciences, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China
| | - Jiahui Shi
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, Research Unit of Proteomics & Research and Development of New Drug, Chinese Academy of Medical Sciences, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China.,Hebei Province Key Lab of Research and Application on Microbial Diversity, College of Life Sciences, Hebei University, Baoding 071002, China
| | - Yanchang Li
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, Research Unit of Proteomics & Research and Development of New Drug, Chinese Academy of Medical Sciences, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China
| | - Huiying Gao
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, Research Unit of Proteomics & Research and Development of New Drug, Chinese Academy of Medical Sciences, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China
| | - Lei Chang
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, Research Unit of Proteomics & Research and Development of New Drug, Chinese Academy of Medical Sciences, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China
| | - Yao Zhang
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, Research Unit of Proteomics & Research and Development of New Drug, Chinese Academy of Medical Sciences, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China
| | - Linrui Gao
- Yunnan Pu-erh Tea Fermentation Engineering Research Center, Yunnan TAETEA Microbial Technology Co., Ltd., Kunming 650217, China
| | - Ping Xu
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, Research Unit of Proteomics & Research and Development of New Drug, Chinese Academy of Medical Sciences, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China.,Hebei Province Key Lab of Research and Application on Microbial Diversity, College of Life Sciences, Hebei University, Baoding 071002, China
| | - Shukun Tang
- Key Laboratory of Microbial Diversity in Southwest China, Ministry of Education, and Laboratory for Conservation and Utilization of Bio-resources, Yunnan Institute of Microbiology, School of Life Sciences, Yunnan University, Kunming 650091, China.,Yunnan Pu-erh Tea Fermentation Engineering Research Center, Yunnan TAETEA Microbial Technology Co., Ltd., Kunming 650217, China
| |
Collapse
|
155
|
Kim JC, Lee MR, Kim S, Park SE, Lee SJ, Shin TY, Kim WJ, Kim J. Transcriptome Analysis of the Japanese Pine Sawyer Beetle, Monochamus alternatus, Infected with the Entomopathogenic Fungus Metarhizium anisopliae JEF-197. J Fungi (Basel) 2021; 7:jof7050373. [PMID: 34068801 PMCID: PMC8151162 DOI: 10.3390/jof7050373] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Revised: 04/16/2021] [Accepted: 05/08/2021] [Indexed: 12/13/2022] Open
Abstract
The Japanese pine sawyer (JPS) beetle, Monochamus alternatus Hope (Coleoptera: Cerambycidae), damages pine trees and transmits the pine wilt nematode, Bursaphelenchus xylophilus Nickle. Chemical agents have been used to control JPS beetle, but due to various issues, efforts are being made to replace these chemical agents with entomopathogenic fungi. We investigated the expression of immune-related genes in JPS beetle in response to infection with JEF-197, a Metarhizium anisopliae isolate, using RNA-seq. RNA samples were obtained from JEF-197, JPS adults treated with JEF-197, and non-treated JPS adults on the 8th day after fungal treatment, and RNA-seq was performed using Illumina sequencing. JPS beetle transcriptome was assembled de novo and differentially expressed gene (DEG) analysis was performed. There were 719 and 1953 up- and downregulated unigenes upon JEF-197 infection, respectively. Upregulated contigs included genes involved in RNA transport, ribosome biogenesis in eukaryotes, spliceosome-related genes, and genes involved in immune-related signaling pathways such as the Toll and Imd pathways. Forty-two fungal DEGs related to energy and protein metabolism were upregulated, and genes involved in the stress response were also upregulated in the infected JPS beetles. Together, our results indicate that infection of JPS beetles by JEF-197 induces the expression of immune-related genes.
Collapse
Affiliation(s)
- Jong-Cheol Kim
- Department of Agricultural Biology, College of Agriculture & Life Sciences, Jeonbuk National University, Jeonju 54896, Korea; (J.-C.K.); (M.-R.L.); (S.K.); (S.-E.P.); (T.-Y.S.)
| | - Mi-Rong Lee
- Department of Agricultural Biology, College of Agriculture & Life Sciences, Jeonbuk National University, Jeonju 54896, Korea; (J.-C.K.); (M.-R.L.); (S.K.); (S.-E.P.); (T.-Y.S.)
| | - Sihyeon Kim
- Department of Agricultural Biology, College of Agriculture & Life Sciences, Jeonbuk National University, Jeonju 54896, Korea; (J.-C.K.); (M.-R.L.); (S.K.); (S.-E.P.); (T.-Y.S.)
| | - So-Eun Park
- Department of Agricultural Biology, College of Agriculture & Life Sciences, Jeonbuk National University, Jeonju 54896, Korea; (J.-C.K.); (M.-R.L.); (S.K.); (S.-E.P.); (T.-Y.S.)
| | - Se-Jin Lee
- Department of Agricultural Life Science, Sunchon National University, Suncheon 57922, Korea;
| | - Tae-Young Shin
- Department of Agricultural Biology, College of Agriculture & Life Sciences, Jeonbuk National University, Jeonju 54896, Korea; (J.-C.K.); (M.-R.L.); (S.K.); (S.-E.P.); (T.-Y.S.)
| | - Woo-Jin Kim
- Department of Agricultural Biology, College of Agriculture & Life Sciences, Jeonbuk National University, Jeonju 54896, Korea; (J.-C.K.); (M.-R.L.); (S.K.); (S.-E.P.); (T.-Y.S.)
- Correspondence: (W.-J.K.); (J.K.); Tel.: +82-63-270-2525 (J.K.)
| | - Jaesu Kim
- Department of Agricultural Biology, College of Agriculture & Life Sciences, Jeonbuk National University, Jeonju 54896, Korea; (J.-C.K.); (M.-R.L.); (S.K.); (S.-E.P.); (T.-Y.S.)
- Department of Agricultural Convergence Technology, Jeonbuk National University, Jeonju 54596, Korea
- Correspondence: (W.-J.K.); (J.K.); Tel.: +82-63-270-2525 (J.K.)
| |
Collapse
|
156
|
Tharakan R, Sawa A. Minireview: Novel Micropeptide Discovery by Proteomics and Deep Sequencing Methods. Front Genet 2021; 12:651485. [PMID: 34025718 PMCID: PMC8136307 DOI: 10.3389/fgene.2021.651485] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2021] [Accepted: 03/22/2021] [Indexed: 12/12/2022] Open
Abstract
A novel class of small proteins, called micropeptides, has recently been discovered in the genome. These proteins, which have been found to play important roles in many physiological and cellular systems, are shorter than 100 amino acids and were overlooked during previous genome annotations. Discovery and characterization of more micropeptides has been ongoing, often using -omics methods such as proteomics, RNA sequencing, and ribosome profiling. In this review, we survey the recent advances in the micropeptides field and describe the methodological and conceptual challenges facing future micropeptide endeavors.
Collapse
Affiliation(s)
- Ravi Tharakan
- National Institute on Aging, National Institutes of Health, Baltimore, MD, United States
| | - Akira Sawa
- Departments of Psychiatry, Neuroscience, Biomedical Engineering, and Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, United States.,Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, United States
| |
Collapse
|
157
|
Gupta RG, Li F, Roszik J, Lizée G. Exploiting Tumor Neoantigens to Target Cancer Evolution: Current Challenges and Promising Therapeutic Approaches. Cancer Discov 2021; 11:1024-1039. [PMID: 33722796 PMCID: PMC8102318 DOI: 10.1158/2159-8290.cd-20-1575] [Citation(s) in RCA: 56] [Impact Index Per Article: 18.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2020] [Revised: 12/16/2020] [Accepted: 12/28/2020] [Indexed: 11/16/2022]
Abstract
Immunotherapeutic manipulation of the antitumor immune response offers an attractive strategy to target genomic instability in cancer. A subset of tumor-specific somatic mutations can be translated into immunogenic and HLA-bound epitopes called neoantigens, which can induce the activation of helper and cytotoxic T lymphocytes. However, cancer immunoediting and immunosuppressive mechanisms often allow tumors to evade immune recognition. Recent evidence also suggests that the tumor neoantigen landscape extends beyond epitopes originating from nonsynonymous single-nucleotide variants in the coding exome. Here we review emerging approaches for identifying, prioritizing, and immunologically targeting personalized neoantigens using polyvalent cancer vaccines and T-cell receptor gene therapy. SIGNIFICANCE: Several major challenges currently impede the clinical efficacy of neoantigen-directed immunotherapy, such as the relative infrequency of immunogenic neoantigens, suboptimal potency and priming of de novo tumor-specific T cells, and tumor cell-intrinsic and -extrinsic mechanisms of immune evasion. A deeper understanding of these biological barriers could help facilitate the development of effective and durable immunotherapy for any type of cancer, including immunologically "cold" tumors that are otherwise therapeutically resistant.
Collapse
Affiliation(s)
- Ravi G Gupta
- Department of Hematology/Oncology, MD Anderson Cancer Center at Cooper, Camden, New Jersey.
| | - Fenge Li
- Department of Melanoma Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Jason Roszik
- Department of Melanoma Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, Texas
- Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Gregory Lizée
- Department of Melanoma Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, Texas.
- Department of Immunology, The University of Texas MD Anderson Cancer Center, Houston, Texas
| |
Collapse
|
158
|
Cheng F, De Luca A, Hogan AL, Rayner SL, Davidson JM, Watchon M, Stevens CH, Muñoz SS, Ooi L, Yerbury JJ, Don EK, Fifita JA, Villalva MD, Suddull H, Chapman TR, Hedl TJ, Walker AK, Yang S, Morsch M, Shi B, Blair IP, Laird AS, Chung RS, Lee A. Unbiased Label-Free Quantitative Proteomics of Cells Expressing Amyotrophic Lateral Sclerosis (ALS) Mutations in CCNF Reveals Activation of the Apoptosis Pathway: A Workflow to Screen Pathogenic Gene Mutations. Front Mol Neurosci 2021; 14:627740. [PMID: 33986643 PMCID: PMC8111008 DOI: 10.3389/fnmol.2021.627740] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Accepted: 03/19/2021] [Indexed: 12/12/2022] Open
Abstract
The past decade has seen a rapid acceleration in the discovery of new genetic causes of ALS, with more than 20 putative ALS-causing genes now cited. These genes encode proteins that cover a diverse range of molecular functions, including free radical scavenging (e.g., SOD1), regulation of RNA homeostasis (e.g., TDP-43 and FUS), and protein degradation through the ubiquitin-proteasome system (e.g., ubiquilin-2 and cyclin F) and autophagy (TBK1 and sequestosome-1/p62). It is likely that the various initial triggers of disease (either genetic, environmental and/or gene-environment interaction) must converge upon a common set of molecular pathways that underlie ALS pathogenesis. Given the complexity, it is not surprising that a catalog of molecular pathways and proteostasis dysfunctions have been linked to ALS. One of the challenges in ALS research is determining, at the early stage of discovery, whether a new gene mutation is indeed disease-specific, and if it is linked to signaling pathways that trigger neuronal cell death. We have established a proof-of-concept proteogenomic workflow to assess new gene mutations, using CCNF (cyclin F) as an example, in cell culture models to screen whether potential gene candidates fit the criteria of activating apoptosis. This can provide an informative and time-efficient output that can be extended further for validation in a variety of in vitro and in vivo models and/or for mechanistic studies. As a proof-of-concept, we expressed cyclin F mutations (K97R, S195R, S509P, R574Q, S621G) in HEK293 cells for label-free quantitative proteomics that bioinformatically predicted activation of the neuronal cell death pathways, which was validated by immunoblot analysis. Proteomic analysis of induced pluripotent stem cells (iPSCs) derived from patient fibroblasts bearing the S621G mutation showed the same activation of these pathways providing compelling evidence for these candidate gene mutations to be strong candidates for further validation and mechanistic studies (such as E3 enzymatic activity assays, protein-protein and protein-substrate studies, and neuronal apoptosis and aberrant branching measurements in zebrafish). Our proteogenomics approach has great utility and provides a relatively high-throughput screening platform to explore candidate gene mutations for their propensity to cause neuronal cell death, which will guide a researcher for further experimental studies.
Collapse
Affiliation(s)
- Flora Cheng
- Centre for Motor Neuron Disease Research, Department of Biomedical Sciences, Faculty of Medicine, Health, and Human Sciences, Macquarie University, North Ryde, NSW, Australia
| | - Alana De Luca
- Centre for Motor Neuron Disease Research, Department of Biomedical Sciences, Faculty of Medicine, Health, and Human Sciences, Macquarie University, North Ryde, NSW, Australia
| | - Alison L Hogan
- Centre for Motor Neuron Disease Research, Department of Biomedical Sciences, Faculty of Medicine, Health, and Human Sciences, Macquarie University, North Ryde, NSW, Australia
| | - Stephanie L Rayner
- Centre for Motor Neuron Disease Research, Department of Biomedical Sciences, Faculty of Medicine, Health, and Human Sciences, Macquarie University, North Ryde, NSW, Australia
| | - Jennilee M Davidson
- Centre for Motor Neuron Disease Research, Department of Biomedical Sciences, Faculty of Medicine, Health, and Human Sciences, Macquarie University, North Ryde, NSW, Australia
| | - Maxinne Watchon
- Centre for Motor Neuron Disease Research, Department of Biomedical Sciences, Faculty of Medicine, Health, and Human Sciences, Macquarie University, North Ryde, NSW, Australia
| | - Claire H Stevens
- Illawarra Health and Medical Research Institute (IHMRI), University of Wollongong, Wollongong, NSW, Australia.,School of Chemistry and Molecular Bioscience and Molecular Horizons, University of Wollongong, Wollongong, NSW, Australia
| | - Sonia Sanz Muñoz
- Illawarra Health and Medical Research Institute (IHMRI), University of Wollongong, Wollongong, NSW, Australia.,School of Chemistry and Molecular Bioscience and Molecular Horizons, University of Wollongong, Wollongong, NSW, Australia
| | - Lezanne Ooi
- Illawarra Health and Medical Research Institute (IHMRI), University of Wollongong, Wollongong, NSW, Australia.,School of Chemistry and Molecular Bioscience and Molecular Horizons, University of Wollongong, Wollongong, NSW, Australia
| | - Justin J Yerbury
- Illawarra Health and Medical Research Institute (IHMRI), University of Wollongong, Wollongong, NSW, Australia.,School of Chemistry and Molecular Bioscience and Molecular Horizons, University of Wollongong, Wollongong, NSW, Australia
| | - Emily K Don
- Centre for Motor Neuron Disease Research, Department of Biomedical Sciences, Faculty of Medicine, Health, and Human Sciences, Macquarie University, North Ryde, NSW, Australia
| | - Jennifer A Fifita
- Centre for Motor Neuron Disease Research, Department of Biomedical Sciences, Faculty of Medicine, Health, and Human Sciences, Macquarie University, North Ryde, NSW, Australia
| | - Maria D Villalva
- Centre for Motor Neuron Disease Research, Department of Biomedical Sciences, Faculty of Medicine, Health, and Human Sciences, Macquarie University, North Ryde, NSW, Australia
| | - Hannah Suddull
- Centre for Motor Neuron Disease Research, Department of Biomedical Sciences, Faculty of Medicine, Health, and Human Sciences, Macquarie University, North Ryde, NSW, Australia
| | - Tyler R Chapman
- Centre for Motor Neuron Disease Research, Department of Biomedical Sciences, Faculty of Medicine, Health, and Human Sciences, Macquarie University, North Ryde, NSW, Australia
| | - Thomas J Hedl
- Neurodegeneration Pathobiology Laboratory, Queensland Brain Institute, The University of Queensland, St Lucia, QLD, Australia
| | - Adam K Walker
- Centre for Motor Neuron Disease Research, Department of Biomedical Sciences, Faculty of Medicine, Health, and Human Sciences, Macquarie University, North Ryde, NSW, Australia.,Neurodegeneration Pathobiology Laboratory, Queensland Brain Institute, The University of Queensland, St Lucia, QLD, Australia
| | - Shu Yang
- Centre for Motor Neuron Disease Research, Department of Biomedical Sciences, Faculty of Medicine, Health, and Human Sciences, Macquarie University, North Ryde, NSW, Australia
| | - Marco Morsch
- Centre for Motor Neuron Disease Research, Department of Biomedical Sciences, Faculty of Medicine, Health, and Human Sciences, Macquarie University, North Ryde, NSW, Australia
| | - Bingyang Shi
- Centre for Motor Neuron Disease Research, Department of Biomedical Sciences, Faculty of Medicine, Health, and Human Sciences, Macquarie University, North Ryde, NSW, Australia
| | - Ian P Blair
- Centre for Motor Neuron Disease Research, Department of Biomedical Sciences, Faculty of Medicine, Health, and Human Sciences, Macquarie University, North Ryde, NSW, Australia
| | - Angela S Laird
- Centre for Motor Neuron Disease Research, Department of Biomedical Sciences, Faculty of Medicine, Health, and Human Sciences, Macquarie University, North Ryde, NSW, Australia
| | - Roger S Chung
- Centre for Motor Neuron Disease Research, Department of Biomedical Sciences, Faculty of Medicine, Health, and Human Sciences, Macquarie University, North Ryde, NSW, Australia
| | - Albert Lee
- Centre for Motor Neuron Disease Research, Department of Biomedical Sciences, Faculty of Medicine, Health, and Human Sciences, Macquarie University, North Ryde, NSW, Australia
| |
Collapse
|
159
|
Braconi D, Bernardini G, Spiga O, Santucci A. Leveraging proteomics in orphan disease research: pitfalls and potential. Expert Rev Proteomics 2021; 18:315-327. [PMID: 33861161 DOI: 10.1080/14789450.2021.1918549] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Introduction: The term 'orphan diseases' includes conditions meeting prevalence-based or commercial viability criteria: they affect a small number of individuals and are considered an unviable market for drug development. Proteomics is an important technology to study them, providing information on mechanisms and evolution, biomarkers, and effects of therapeutic interventions.Areas covered: Herein, we review how proteomics and bioinformatic tools could be applied to the study of rare diseases and discuss pitfalls and potential.Expert opinion: Research in the field of rare diseases has to face many challenges, and implementation plans should foresee highly specialized collaborative consortia to create multidisciplinary frameworks for data sharing, advancing research, supporting clinical studies, and accelerating drug development. The integration of different technologies will allow better knowledge of disease pathophysiology, and the inclusion of proteomics and other omics technologies in this context will be pivotal to this aim.Several aspects of rare diseases, often perceived as limiting factors, might actually be advantages for a precision medicine approach: the limited number of patients, the collaboration with patient societies, and the availability of curated clinical registries could allow the development of homogeneous clinical databases and ultimately a better control over the data to be analyzed.
Collapse
Affiliation(s)
- Daniela Braconi
- Department of Biotechnology, Chemistry and Pharmacy, University of Siena, Siena, Italy
| | - Giulia Bernardini
- Department of Biotechnology, Chemistry and Pharmacy, University of Siena, Siena, Italy
| | - Ottavia Spiga
- Department of Biotechnology, Chemistry and Pharmacy, University of Siena, Siena, Italy
| | - Annalisa Santucci
- Department of Biotechnology, Chemistry and Pharmacy, University of Siena, Siena, Italy
| |
Collapse
|
160
|
Tolani P, Gupta S, Yadav K, Aggarwal S, Yadav AK. Big data, integrative omics and network biology. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2021; 127:127-160. [PMID: 34340766 DOI: 10.1016/bs.apcsb.2021.03.006] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
A cell integrates various signals through a network of biomolecules that crosstalk to synergistically regulate the replication, transcription, translation and other metabolic activities of a cell. These networks regulate signal perception and processing that drives biological functions. The biological complexity cannot be fully captured by a single -omics discipline. The holistic study of an organism-in health, perturbation, exposure to environment and disease, is studied under systems biology. The bottom-up molecular approaches (genes, mRNA, protein, metabolite, etc.) have laid the foundation of current biological knowledge covering the horizon from viruses, bacteria, fungi, plants and animals. Yet, these techniques provide a rather myopic view of biology at the molecular level. To understand how the interconnected molecular components are formed and rewired in disease or exposure to environmental stimuli is the holy grail of modern biology. The omics era was heralded by the genomics revolution but advanced sequencing techniques are now also ubiquitous in transcriptomics, proteomics, metabolomics and lipidomics. Multi-omics data analysis and integration techniques are driving the quest for deeper insights into how the different layers of biomolecules talk to each other in diverse contexts.
Collapse
Affiliation(s)
- Priya Tolani
- Translational Health Science and Technology Institute, NCR Biotech Science Cluster, Faridabad, Haryana, India
| | - Srishti Gupta
- Translational Health Science and Technology Institute, NCR Biotech Science Cluster, Faridabad, Haryana, India; School of Biosciences and Technology, Vellore Institute of Technology, Vellore, India
| | - Kirti Yadav
- Translational Health Science and Technology Institute, NCR Biotech Science Cluster, Faridabad, Haryana, India; Department of Pharmaceutical Biotechnology, Delhi Pharmaceutical Sciences and Research University, New Delhi, India
| | - Suruchi Aggarwal
- Translational Health Science and Technology Institute, NCR Biotech Science Cluster, Faridabad, Haryana, India; Department of Molecular Biology and Biotechnology, Cotton University, Guwahati, Assam, India
| | - Amit Kumar Yadav
- Translational Health Science and Technology Institute, NCR Biotech Science Cluster, Faridabad, Haryana, India.
| |
Collapse
|
161
|
Verbruggen S, Gessulat S, Gabriels R, Matsaroki A, Van de Voorde H, Kuster B, Degroeve S, Martens L, Van Criekinge W, Wilhelm M, Menschaert G. Spectral Prediction Features as a Solution for the Search Space Size Problem in Proteogenomics. Mol Cell Proteomics 2021; 20:100076. [PMID: 33823297 PMCID: PMC8214147 DOI: 10.1016/j.mcpro.2021.100076] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Revised: 03/04/2021] [Accepted: 03/25/2021] [Indexed: 11/17/2022] Open
Abstract
Proteogenomics approaches often struggle with the distinction between true and false peptide-to-spectrum matches as the database size enlarges. However, features extracted from tandem mass spectrometry intensity predictors can enhance the peptide identification rate and can provide extra confidence for peptide-to-spectrum matching in a proteogenomics context. To that end, features from the spectral intensity pattern predictors MS2PIP and Prosit were combined with the canonical scores from MaxQuant in the Percolator postprocessing tool for protein sequence databases constructed out of ribosome profiling and nanopore RNA-Seq analyses. The presented results provide evidence that this approach enhances both the identification rate as well as the validation stringency in a proteogenomic setting. First proteogenomics with PSM rescoring using machine learning–predicted spectra Demonstrated on both ribosome profiling and nanopore RNA-Seq–derived databases Rescoring leads to elevated stringency and increased identification rates Rescoring compensates for the search space size issues in proteogenomics
Collapse
Affiliation(s)
- Steven Verbruggen
- BioBix, Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modeling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium; OHMX.bio, Ghent, Belgium
| | - Siegfried Gessulat
- Chair of Proteomics and Bioanalytics, Technical University of Munich, Freising, Germany
| | - Ralf Gabriels
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium; VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
| | | | | | - Bernhard Kuster
- Chair of Proteomics and Bioanalytics, Technical University of Munich, Freising, Germany
| | - Sven Degroeve
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium; VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
| | - Lennart Martens
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium; VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
| | - Wim Van Criekinge
- BioBix, Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modeling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium
| | - Mathias Wilhelm
- Chair of Proteomics and Bioanalytics, Technical University of Munich, Freising, Germany
| | - Gerben Menschaert
- BioBix, Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modeling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium; OHMX.bio, Ghent, Belgium.
| |
Collapse
|
162
|
Moaddel R, Ubaida‐Mohien C, Tanaka T, Lyashkov A, Basisty N, Schilling B, Semba RD, Franceschi C, Gorospe M, Ferrucci L. Proteomics in aging research: A roadmap to clinical, translational research. Aging Cell 2021; 20:e13325. [PMID: 33730416 PMCID: PMC8045948 DOI: 10.1111/acel.13325] [Citation(s) in RCA: 62] [Impact Index Per Article: 20.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2020] [Revised: 12/31/2020] [Accepted: 01/18/2021] [Indexed: 02/06/2023] Open
Abstract
The identification of plasma proteins that systematically change with age and, independent of chronological age, predict accelerated decline of health is an expanding area of research. Circulating proteins are ideal translational "omics" since they are final effectors of physiological pathways and because physicians are accustomed to use information of plasma proteins as biomarkers for diagnosis, prognosis, and tracking the effectiveness of treatments. Recent technological advancements, including mass spectrometry (MS)-based proteomics, multiplexed proteomic assay using modified aptamers (SOMAscan), and Proximity Extension Assay (PEA, O-Link), have allowed for the assessment of thousands of proteins in plasma or other biological matrices, which are potentially translatable into new clinical biomarkers and provide new clues about the mechanisms by which aging is associated with health deterioration and functional decline. We carried out a detailed literature search for proteomic studies performed in different matrices (plasma, serum, urine, saliva, tissues) and species using multiple platforms. Herein, we identified 232 proteins that were age-associated across studies. Enrichment analysis of the 232 age-associated proteins revealed metabolic pathways previously connected with biological aging both in animal models and in humans, most remarkably insulin-like growth factor (IGF) signaling, mitogen-activated protein kinases (MAPK), hypoxia-inducible factor 1 (HIF1), cytokine signaling, Forkhead Box O (FOXO) metabolic pathways, folate metabolism, advance glycation end products (AGE), and receptor AGE (RAGE) metabolic pathway. Information on these age-relevant proteins, likely expanded and validated in longitudinal studies and examined in mechanistic studies, will be essential for patient stratification and the development of new treatments aimed at improving health expectancy.
Collapse
Affiliation(s)
- Ruin Moaddel
- Biomedical Research Centre National Institute on Aging, NIH Baltimore MD USA
| | | | - Toshiko Tanaka
- Biomedical Research Centre National Institute on Aging, NIH Baltimore MD USA
| | - Alexey Lyashkov
- Biomedical Research Centre National Institute on Aging, NIH Baltimore MD USA
| | | | | | - Richard D Semba
- Wilmer Eye Institute Johns Hopkins University School of Medicine Baltimore MD USA
| | - Claudio Franceschi
- University of Bologna and IRCCS Institute of Neurological Sciences Bologna Italy
| | - Myriam Gorospe
- Biomedical Research Centre National Institute on Aging, NIH Baltimore MD USA
| | - Luigi Ferrucci
- Biomedical Research Centre National Institute on Aging, NIH Baltimore MD USA
| |
Collapse
|
163
|
Pan Y, Kadash-Edmondson KE, Wang R, Phillips J, Liu S, Ribas A, Aplenc R, Witte ON, Xing Y. RNA Dysregulation: An Expanding Source of Cancer Immunotherapy Targets. Trends Pharmacol Sci 2021; 42:268-282. [PMID: 33711255 PMCID: PMC8761020 DOI: 10.1016/j.tips.2021.01.006] [Citation(s) in RCA: 39] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2020] [Revised: 01/18/2021] [Accepted: 01/25/2021] [Indexed: 12/14/2022]
Abstract
Cancer transcriptomes frequently exhibit RNA dysregulation. As the resulting aberrant transcripts may be translated into cancer-specific proteins, there is growing interest in exploiting RNA dysregulation as a source of tumor antigens (TAs) and thus novel immunotherapy targets. Recent advances in high-throughput technologies and rapid accumulation of multiomic cancer profiling data in public repositories have provided opportunities to systematically characterize RNA dysregulation in cancer and identify antigen targets for immunotherapy. However, given the complexity of cancer transcriptomes and proteomes, important conceptual and technological challenges exist. Here, we highlight the expanding repertoire of TAs arising from RNA dysregulation and introduce multiomic and big data strategies for identifying optimal immunotherapy targets. We discuss extant barriers for translating these targets into effective therapies as well as the implications for future research.
Collapse
Affiliation(s)
- Yang Pan
- Bioinformatics Interdepartmental Graduate Program, University of California, Los Angeles, Los Angeles, CA 90095, USA; Center for Computational and Genomic Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Kathryn E Kadash-Edmondson
- Center for Computational and Genomic Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Robert Wang
- Graduate Group in Genomics and Computational Biology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - John Phillips
- Department of Microbiology, Immunology and Molecular Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Song Liu
- Department of Biostatistics and Bioinformatics, Roswell Park Comprehensive Cancer Center, Buffalo, NY 14263, USA
| | - Antoni Ribas
- Department of Molecular and Medical Pharmacology, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA; Jonsson Comprehensive Cancer Center, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Surgery, University of California, Los Angeles, Los Angeles, CA 90095, USA; Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Richard Aplenc
- Division of Oncology, Center for Childhood Cancer Research, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Owen N Witte
- Department of Microbiology, Immunology and Molecular Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Molecular and Medical Pharmacology, University of California, Los Angeles, Los Angeles, CA 90095, USA; Jonsson Comprehensive Cancer Center, University of California, Los Angeles, Los Angeles, CA 90095, USA; Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Yi Xing
- Center for Computational and Genomic Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.
| |
Collapse
|
164
|
Cao X, Xing J. PrecisionProDB: improving the proteomics performance for precision medicine. Bioinformatics 2021; 37:3361-3363. [PMID: 33787868 DOI: 10.1093/bioinformatics/btab218] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2021] [Revised: 03/06/2021] [Accepted: 03/30/2021] [Indexed: 01/03/2023] Open
Abstract
SUMMARY As the next-generation sequencing technology becomes broadly applied, genomics and transcriptomics are becoming more commonly used in both research and clinical settings. However, proteomics is still an obstacle to be conquered. For most peptide search programs in proteomics, a standard reference protein database is used. Because of the thousands of coding DNA variants in each individual, a standard reference database does not provide perfect match for many proteins/peptides of an individual. A personalized reference database can improve the detection power and accuracy for individual proteomics data. To connect genomics and proteomics, we designed a Python package PrecisionProDB that is specialized for generating a personized protein database for proteomics applications. PrecisionProDB supports multiple popular file formats and reference databases, and can generate a personized database in minutes. To demonstrate the application of PrecisionProDB, we generated human population-specific reference protein databases with PrecisionProDB, which improves the number of identified peptides by 0.34% on average. In addition, by incorporating cell line-specific variants into the protein database, we demonstrated a 0.71% improvement for peptide identification in the Jurkat cell line. With PrecisionProDB and these datasets, researchers and clinicians can improve their peptide search performance by adopting the more representative protein database or adding population and individual-specific proteins to the search database with minimum increase of efforts. AVAILABILITY PrecisionProDB and pre-calculated protein databases are freely available at https://github.com/ATPs/PrecisionProDB and https://github.com/ATPs/PrecisionProDB_references. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xiaolong Cao
- Department of Genetics, Human Genetic Institute of New Jersey, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA
| | - Jinchuan Xing
- Department of Genetics, Human Genetic Institute of New Jersey, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA
| |
Collapse
|
165
|
Isaacs AM, Morton SU, Movassagh M, Zhang Q, Hehnly C, Zhang L, Morales DM, Sinnar SA, Ericson JE, Mbabazi-Kabachelor E, Ssenyonga P, Onen J, Mulondo R, Hornig M, Warf BC, Broach JR, Townsend RR, Limbrick DD, Paulson JN, Schiff SJ. Immune activation during Paenibacillus brain infection in African infants with frequent cytomegalovirus co-infection. iScience 2021; 24:102351. [PMID: 33912816 PMCID: PMC8065213 DOI: 10.1016/j.isci.2021.102351] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Revised: 02/24/2021] [Accepted: 03/19/2021] [Indexed: 12/16/2022] Open
Abstract
Inflammation during neonatal brain infections leads to significant secondary sequelae such as hydrocephalus, which often follows neonatal sepsis in the developing world. In 100 African hydrocephalic infants we identified the biological pathways that account for this response. The dominant bacterial pathogen was a Paenibacillus species, with frequent cytomegalovirus co-infection. A proteogenomic strategy was employed to confirm host immune response to Paenibacillus and to define the interplay within the host immune response network. Immune activation emphasized neuroinflammation, oxidative stress reaction, and extracellular matrix organization. The innate immune system response included neutrophil activity, signaling via IL-4, IL-12, IL-13, interferon, and Jak/STAT pathways. Platelet-activating factors and factors involved with microbe recognition such as Class I MHC antigen-presenting complex were also increased. Evidence suggests that dysregulated neuroinflammation propagates inflammatory hydrocephalus, and these pathways are potential targets for adjunctive treatments to reduce the hazards of neuroinflammation and risk of hydrocephalus following neonatal sepsis. There is a characteristic immune response to Paenibacillus brain infection There is a characteristic immune response to CMV brain infection The matching immune response validates pathogen genomic presence The combined results support molecular infection causality
Collapse
Affiliation(s)
- Albert M Isaacs
- Department of Neuroscience, Washington University School of Medicine, St. Louis, MO 63110, USA.,Department of Clinical Neurosciences, University of Calgary, Calgary, AB T2N 1N4, Canada
| | - Sarah U Morton
- Division of Newborn Medicine, Boston Children's Hospital, Boston, MA 02115, USA.,Department of Pediatrics, Harvard Medical School, Boston, MA 02115, USA
| | - Mercedeh Movassagh
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Qiang Zhang
- Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Christine Hehnly
- Institute for Personalized Medicine, Pennsylvania State University, Hershey, PA 17033, USA.,Department of Biochemistry and Molecular Biology, Pennsylvania State University, State College, PA 16801, USA
| | - Lijun Zhang
- Institute for Personalized Medicine, Pennsylvania State University, Hershey, PA 17033, USA
| | - Diego M Morales
- Department of Neurosurgery, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Shamim A Sinnar
- Center for Neural Engineering, Pennsylvania State University, State College, PA 16801, USA.,Department of Medicine, Pennsylvania State University College of Medicine, Hershey, PA 17033, USA
| | - Jessica E Ericson
- Department of Pediatrics, Pennsylvania State College of Medicine, Hershey, PA 17033, USA
| | | | | | - Justin Onen
- CURE Children's Hospital of Uganda, Mbale, Uganda
| | | | - Mady Hornig
- Department of Epidemiology, Columbia University Mailman School of Public Health, New York, NY 10032, USA
| | - Benjamin C Warf
- Department of Neurosurgery, Harvard Medical School, Boston, MA 02115, USA
| | - James R Broach
- Institute for Personalized Medicine, Pennsylvania State University, Hershey, PA 17033, USA.,Department of Biochemistry and Molecular Biology, Pennsylvania State University, State College, PA 16801, USA
| | - R Reid Townsend
- Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - David D Limbrick
- Department of Neurosurgery, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Joseph N Paulson
- Department of Biostatistics, Product Development, Genentech Inc., South San Francisco, CA 94080, USA
| | - Steven J Schiff
- Center for Neural Engineering, Pennsylvania State University, State College, PA 16801, USA.,Center for Infectious Disease Dynamics, Departments of Neurosurgery, Engineering Science and Mechanics, and Physics, The Pennsylvania State University, University Park, PA 16802, USA
| |
Collapse
|
166
|
Ruiz Cuevas MV, Hardy MP, Hollý J, Bonneil É, Durette C, Courcelles M, Lanoix J, Côté C, Staudt LM, Lemieux S, Thibault P, Perreault C, Yewdell JW. Most non-canonical proteins uniquely populate the proteome or immunopeptidome. Cell Rep 2021; 34:108815. [PMID: 33691108 PMCID: PMC8040094 DOI: 10.1016/j.celrep.2021.108815] [Citation(s) in RCA: 109] [Impact Index Per Article: 36.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2020] [Revised: 01/29/2021] [Accepted: 02/10/2021] [Indexed: 12/16/2022] Open
Abstract
Combining RNA sequencing, ribosome profiling, and mass spectrometry, we elucidate the contribution of non-canonical translation to the proteome and major histocompatibility complex (MHC) class I immunopeptidome. Remarkably, of 14,498 proteins identified in three human B cell lymphomas, 2,503 are non-canonical proteins. Of these, 28% are novel isoforms and 72% are cryptic proteins encoded by ostensibly non-coding regions (60%) or frameshifted canonical genes (12%). Cryptic proteins are translated as efficiently as canonical proteins, have more predicted disordered residues and lower stability, and critically generate MHC-I peptides 5-fold more efficiently per translation event. Translating 5' "untranslated" regions hinders downstream translation of genes involved in transcription, translation, and antiviral responses. Novel protein isoforms show strong enrichment for signaling pathways deregulated in cancer. Only a small fraction of cryptic proteins detected in the proteome contribute to the MHC-I immunopeptidome, demonstrating the high preferential access of cryptic defective ribosomal products to the class I pathway.
Collapse
Affiliation(s)
- Maria Virginia Ruiz Cuevas
- Institute for Research in Immunology and Cancer (IRIC), Université de Montréal, Montreal, QC H3C 3J7, Canada; Department of Biochemistry and Molecular Medicine, Université de Montréal, Montreal, QC H3C 3J7, Canada
| | - Marie-Pierre Hardy
- Institute for Research in Immunology and Cancer (IRIC), Université de Montréal, Montreal, QC H3C 3J7, Canada
| | - Jaroslav Hollý
- Cellular Biology Section, Laboratory of Viral Diseases, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892, USA
| | - Éric Bonneil
- Institute for Research in Immunology and Cancer (IRIC), Université de Montréal, Montreal, QC H3C 3J7, Canada
| | - Chantal Durette
- Institute for Research in Immunology and Cancer (IRIC), Université de Montréal, Montreal, QC H3C 3J7, Canada
| | - Mathieu Courcelles
- Institute for Research in Immunology and Cancer (IRIC), Université de Montréal, Montreal, QC H3C 3J7, Canada
| | - Joël Lanoix
- Institute for Research in Immunology and Cancer (IRIC), Université de Montréal, Montreal, QC H3C 3J7, Canada
| | - Caroline Côté
- Institute for Research in Immunology and Cancer (IRIC), Université de Montréal, Montreal, QC H3C 3J7, Canada
| | - Louis M Staudt
- Lymphoid Malignancies Branch, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Sébastien Lemieux
- Institute for Research in Immunology and Cancer (IRIC), Université de Montréal, Montreal, QC H3C 3J7, Canada; Department of Biochemistry and Molecular Medicine, Université de Montréal, Montreal, QC H3C 3J7, Canada
| | - Pierre Thibault
- Institute for Research in Immunology and Cancer (IRIC), Université de Montréal, Montreal, QC H3C 3J7, Canada; Department of Chemistry, Université de Montréal, Montreal, QC H3C 3J7, Canada
| | - Claude Perreault
- Institute for Research in Immunology and Cancer (IRIC), Université de Montréal, Montreal, QC H3C 3J7, Canada; Department of Medicine, Université de Montréal, Montreal, QC H3C 3J7, Canada.
| | - Jonathan W Yewdell
- Cellular Biology Section, Laboratory of Viral Diseases, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892, USA.
| |
Collapse
|
167
|
Choi S, Paek E. MutCombinator: identification of mutated peptides allowing combinatorial mutations using nucleotide-based graph search. Bioinformatics 2021; 36:i203-i209. [PMID: 32657416 PMCID: PMC7355298 DOI: 10.1093/bioinformatics/btaa504] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
Motivation Proteogenomics has proven its utility by integrating genomics and proteomics. Typical approaches use data from next-generation sequencing to infer proteins expressed. A sample-specific protein sequence database is often adopted to identify novel peptides from matched mass spectrometry-based proteomics; nevertheless, there is no software that can practically identify all possible forms of mutated peptides suggested by various genomic information sources. Results We propose MutCombinator, which enables us to practically identify mutated peptides from tandem mass spectra allowing combinatorial mutations during the database search. It uses an upgraded version of a variant graph, keeping track of frame information. The variant graph is indexed by nine nucleotides for fast access. Using MutCombinator, we could identify more mutated peptides than previous methods, because combinations of point mutations are considered and also because it can be practically applied together with a large mutation database such as COSMIC. Furthermore, MutCombinator supports in-frame search for coding regions and three-frame search for non-coding regions. Availability and implementation https://prix.hanyang.ac.kr/download/mutcombinator.jsp. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Seunghyuk Choi
- Department of Computer Science, Hanyang University, Seongdong-gu, Seoul 04763, Republic of Korea
| | - Eunok Paek
- Department of Computer Science, Hanyang University, Seongdong-gu, Seoul 04763, Republic of Korea
| |
Collapse
|
168
|
DU Z, SHAO W, QIN W. [Research progress and application of retention time prediction method based on deep learning]. Se Pu 2021; 39:211-218. [PMID: 34227303 PMCID: PMC9403805 DOI: 10.3724/sp.j.1123.2020.08015] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Indexed: 11/25/2022] Open
Abstract
In "shotgun" proteomics strategy, the proteome is explained by analyzing tryptic digested peptides using liquid chromatography-mass spectrometry. In this strategy, the retention time of peptides in liquid chromatography separation can be predicted based on the peptide sequence. This is a useful feature for peptide identification. Therefore, the prediction of the retention time has attracted much research attention. Traditional methods calculate the physical and chemical properties of the peptides based on their amino acid sequence to obtain the retention time under certain chromatography conditions; however, these methods cannot be directly adopted for other chromatography conditions, nor can they be used across laboratories or instrument platforms. To solve this problem, in recent years, deep learning was introduced to proteomics research for retention time prediction. Deep learning is an advanced machine-learning method that has extraordinary capability to learn complex relationships from large-scale data. By stacking multiple hidden neural networks, deep learning can ingest raw data without manually designed features. Transfer learning is an important method in deep learning. It improves the learning process a new task through the transfer of knowledge from an already-learned related task. Transfer learning allows models trained using large datasets to be utilized across conditions by fine-tuning on smaller datasets, instead of retraining the whole model. Many retention time prediction methods have been developed. In the process of training the model, the sequences of peptides are encoded to represent peptide information. Deep learning considers the relationship between the characteristics of the peptides and their corresponding retention times without the need for manual input of the physical and chemical properties of the peptides. Compared with traditional methods, deep learning methods have higher accuracy and can be easily used under different chromatography conditions by transfer learning. If there are not enough datasets to train a new model, a trained model from other datasets can be used as a replacement after calibration with small datasets obtained from these chromatography conditions. While the retention times of modified peptides can also be predicted, the predictions are inadequate for complex modifications such as glycosylation, and this is one of the main problems to be solved. The predicted retention times were used to control the quality of peptide identification. With high accuracy, the predicted retention times can be considered as actual retention times. Therefore, the difference between predicted and observed retention times can serve as an effective and unbiased quantitative metric for evaluating the quality of peptide-spectrum matches (PSMs) reported using different peptide identification methods. Combined with fragment ion intensity prediction, retention time prediction is used to generate spectral libraries for data-independent acquisition (DIA)-based mass spectrometry analysis. Generally, DIA methods identify peptides using specific spectrum libraries obtained from data-dependent acquisition (DDA) experiments. As a result, only peptides detected in the DDA experiments can be present in the libraries and detected in DIA. Furthermore, it takes a lot of time and effort to build libraries from DDA experiments, and typically, they cannot be adopted across different laboratories or instrument platforms. In contrast, the pseudo spectral libraries generated by retention times and fragment ion intensity prediction can overcome these shortcomings. The pseudo spectral libraries generate theoretical spectra of all possible peptides without the need for DDA experiments. This paper reviews the research progress of deep learning methods in the prediction of retention time and in related applications in order to provide references for retention time prediction and protein identification. At the same time, the development direction and application trend of retention time prediction methods based on deep learning are discussed.
Collapse
|
169
|
Omenn GS. Reflections on the HUPO Human Proteome Project, the Flagship Project of the Human Proteome Organization, at 10 Years. Mol Cell Proteomics 2021; 20:100062. [PMID: 33640492 PMCID: PMC8058560 DOI: 10.1016/j.mcpro.2021.100062] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Revised: 02/04/2021] [Accepted: 02/05/2021] [Indexed: 02/08/2023] Open
Abstract
We celebrate the 10th anniversary of the launch of the HUPO Human Proteome Project (HPP) and its major milestone of confident detection of at least one protein from each of 90% of the predicted protein-coding genes, based on the output of the entire proteomics community. The Human Genome Project reached a similar decadal milestone 20 years ago. The HPP has engaged proteomics teams around the world, strongly influenced data-sharing, enhanced quality assurance, and issued stringent guidelines for claims of detecting previously "missing proteins." This invited perspective complements papers on "A High-Stringency Blueprint of the Human Proteome" and "The Human Proteome Reaches a Major Milestone" in special issues of Nature Communications and Journal of Proteome Research, respectively, released in conjunction with the October 2020 virtual HUPO Congress and its celebration of the 10th anniversary of the HUPO HPP.
Collapse
Affiliation(s)
- Gilbert S Omenn
- University of Michigan Medical School, Departments of Computational Medicine & Bioinformatics, Internal Medicine, Human Genetics, and School of Public Health, Ann Arbor, Michigan, USA.
| |
Collapse
|
170
|
Petruschke H, Schori C, Canzler S, Riesbeck S, Poehlein A, Daniel R, Frei D, Segessemann T, Zimmerman J, Marinos G, Kaleta C, Jehmlich N, Ahrens CH, von Bergen M. Discovery of novel community-relevant small proteins in a simplified human intestinal microbiome. MICROBIOME 2021; 9:55. [PMID: 33622394 PMCID: PMC7903761 DOI: 10.1186/s40168-020-00981-z] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Accepted: 12/16/2020] [Indexed: 05/13/2023]
Abstract
BACKGROUND The intestinal microbiota plays a crucial role in protecting the host from pathogenic microbes, modulating immunity and regulating metabolic processes. We studied the simplified human intestinal microbiota (SIHUMIx) consisting of eight bacterial species with a particular focus on the discovery of novel small proteins with less than 100 amino acids (= sProteins), some of which may contribute to shape the simplified human intestinal microbiota. Although sProteins carry out a wide range of important functions, they are still often missed in genome annotations, and little is known about their structure and function in individual microbes and especially in microbial communities. RESULTS We created a multi-species integrated proteogenomics search database (iPtgxDB) to enable a comprehensive identification of novel sProteins. Six of the eight SIHUMIx species, for which no complete genomes were available, were sequenced and de novo assembled. Several proteomics approaches including two earlier optimized sProtein enrichment strategies were applied to specifically increase the chances for novel sProtein discovery. The search of tandem mass spectrometry (MS/MS) data against the multi-species iPtgxDB enabled the identification of 31 novel sProteins, of which the expression of 30 was supported by metatranscriptomics data. Using synthetic peptides, we were able to validate the expression of 25 novel sProteins. The comparison of sProtein expression in each single strain versus a multi-species community cultivation showed that six of these sProteins were only identified in the SIHUMIx community indicating a potentially important role of sProteins in the organization of microbial communities. Two of these novel sProteins have a potential antimicrobial function. Metabolic modelling revealed that a third sProtein is located in a genomic region encoding several enzymes relevant for the community metabolism within SIHUMIx. CONCLUSIONS We outline an integrated experimental and bioinformatics workflow for the discovery of novel sProteins in a simplified intestinal model system that can be generically applied to other microbial communities. The further analysis of novel sProteins uniquely expressed in the SIHUMIx multi-species community is expected to enable new insights into the role of sProteins on the functionality of bacterial communities such as those of the human intestinal tract. Video abstract.
Collapse
Affiliation(s)
- Hannes Petruschke
- Department of Molecular Systems Biology, Helmholtz-Centre for Environmental Research - UFZ GmbH, Leipzig, Germany
| | - Christian Schori
- Agroscope, Molecular Diagnostics, Genomics & Bioinformatics and SIB Swiss Institute of Bioinformatics, Wädenswil, Switzerland
| | - Sebastian Canzler
- Department of Molecular Systems Biology, Helmholtz-Centre for Environmental Research - UFZ GmbH, Leipzig, Germany
| | - Sarah Riesbeck
- Department of Molecular Systems Biology, Helmholtz-Centre for Environmental Research - UFZ GmbH, Leipzig, Germany
| | - Anja Poehlein
- Institute of Microbiology and Genetics, Department of Genomic and Applied Microbiology, Georg-August University of Göttingen, Göttingen, Germany
| | - Rolf Daniel
- Institute of Microbiology and Genetics, Department of Genomic and Applied Microbiology, Georg-August University of Göttingen, Göttingen, Germany
| | - Daniel Frei
- Agroscope, Molecular Diagnostics, Genomics & Bioinformatics and SIB Swiss Institute of Bioinformatics, Wädenswil, Switzerland
| | - Tina Segessemann
- Agroscope, Molecular Diagnostics, Genomics & Bioinformatics and SIB Swiss Institute of Bioinformatics, Wädenswil, Switzerland
| | - Johannes Zimmerman
- Research Group Medical Systems Biology, Institute for Experimental Medicine, Christian-Albrechts-University Kiel, Kiel, Germany
| | - Georgios Marinos
- Research Group Medical Systems Biology, Institute for Experimental Medicine, Christian-Albrechts-University Kiel, Kiel, Germany
| | - Christoph Kaleta
- Research Group Medical Systems Biology, Institute for Experimental Medicine, Christian-Albrechts-University Kiel, Kiel, Germany
| | - Nico Jehmlich
- Department of Molecular Systems Biology, Helmholtz-Centre for Environmental Research - UFZ GmbH, Leipzig, Germany
| | - Christian H Ahrens
- Agroscope, Molecular Diagnostics, Genomics & Bioinformatics and SIB Swiss Institute of Bioinformatics, Wädenswil, Switzerland.
| | - Martin von Bergen
- Department of Molecular Systems Biology, Helmholtz-Centre for Environmental Research - UFZ GmbH, Leipzig, Germany.
- Institute of Biochemistry, Faculty of Biosciences, Pharmacy and Psychology, University of Leipzig, Leipzig, Germany.
| |
Collapse
|
171
|
Son M, Kim H, Han D, Kim Y, Huh I, Han Y, Hong SM, Kwon W, Kim H, Jang JY, Kim Y. A Clinically Applicable 24-Protein Model for Classifying Risk Subgroups in Pancreatic Ductal Adenocarcinomas using Multiple Reaction Monitoring-Mass Spectrometry. Clin Cancer Res 2021; 27:3370-3382. [PMID: 33593883 DOI: 10.1158/1078-0432.ccr-20-3513] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2020] [Revised: 01/12/2021] [Accepted: 02/12/2021] [Indexed: 11/16/2022]
Abstract
PURPOSE Pancreatic ductal adenocarcinoma (PDAC) subtypes have been identified using various methodologies. However, it is a challenge to develop classification system applicable to routine clinical evaluation. We aimed to identify risk subgroups based on molecular features and develop a classification model that was more suited for clinical applications. EXPERIMENTAL DESIGN We collected whole dissected specimens from 225 patients who underwent surgery at Seoul National University Hospital [Seoul, Republic of Korea (South)], between October 2009 and February 2018. Target proteins with potential relevance to tumor progression or prognosis were quantified with robust quality controls. We used hierarchical clustering analysis to identify risk subgroups. A random forest classification model was developed to predict the identified risk subgroups, and the model was validated using transcriptomic datasets from external cohorts (N = 700), with survival analysis. RESULTS We identified 24 protein features that could classify the four risk subgroups associated with patient outcomes: stable, exocrine-like; activated, and extracellular matrix (ECM) remodeling. The "stable" risk subgroup was characterized by proteins that were associated with differentiation and tumor suppressors. "Exocrine-like" tumors highly expressed pancreatic enzymes. Two high-risk subgroups, "activated" and "ECM remodeling," were enriched in terms such as cell cycle, angiogenesis, immunocompetence, tumor invasion metastasis, and metabolic reprogramming. The classification model that included these features made prognoses with relative accuracy and precision in multiple cohorts. CONCLUSIONS We proposed PDAC risk subgroups and developed a classification model that may potentially be useful for routine clinical implementations, at the individual level. This clinical system may improve the accuracy of risk prediction and treatment guidelines.See related commentary by Thakur and Singh, p. 3272.
Collapse
Affiliation(s)
- Minsoo Son
- Department of Biomedical Engineering, Seoul National University College of Medicine, Seoul, Republic of Korea (South)
| | - Hongbeom Kim
- Department of Surgery and Cancer Research Institute, Seoul National University College of Medicine, Seoul, Republic of Korea (South)
| | - Dohyun Han
- Biomedical Research Institute, Seoul National University Hospital, Seoul, Republic of Korea (South)
| | - Yoseop Kim
- Department of Biomedical Engineering, Seoul National University College of Medicine, Seoul, Republic of Korea (South)
| | - Iksoo Huh
- College of Nursing and Research Institute of Nursing Science, Seoul National University, Seoul, Republic of Korea (South)
| | - Youngmin Han
- Department of Surgery and Cancer Research Institute, Seoul National University College of Medicine, Seoul, Republic of Korea (South)
| | - Seung-Mo Hong
- Department of Pathology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea (South)
| | - Wooil Kwon
- Department of Surgery and Cancer Research Institute, Seoul National University College of Medicine, Seoul, Republic of Korea (South)
| | - Haeryoung Kim
- Department of Pathology, Seoul National University College of Medicine, Seoul, Republic of Korea (South)
| | - Jin-Young Jang
- Department of Surgery and Cancer Research Institute, Seoul National University College of Medicine, Seoul, Republic of Korea (South).
| | - Youngsoo Kim
- Department of Biomedical Engineering, Seoul National University College of Medicine, Seoul, Republic of Korea (South).
| |
Collapse
|
172
|
Swamy KBS, Schuyler SC, Leu JY. Protein Complexes Form a Basis for Complex Hybrid Incompatibility. Front Genet 2021; 12:609766. [PMID: 33633780 PMCID: PMC7900514 DOI: 10.3389/fgene.2021.609766] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Accepted: 01/20/2021] [Indexed: 12/20/2022] Open
Abstract
Proteins are the workhorses of the cell and execute many of their functions by interacting with other proteins forming protein complexes. Multi-protein complexes are an admixture of subunits, change their interaction partners, and modulate their functions and cellular physiology in response to environmental changes. When two species mate, the hybrid offspring are usually inviable or sterile because of large-scale differences in the genetic makeup between the two parents causing incompatible genetic interactions. Such reciprocal-sign epistasis between inter-specific alleles is not limited to incompatible interactions between just one gene pair; and, usually involves multiple genes. Many of these multi-locus incompatibilities show visible defects, only in the presence of all the interactions, making it hard to characterize. Understanding the dynamics of protein-protein interactions (PPIs) leading to multi-protein complexes is better suited to characterize multi-locus incompatibilities, compared to studying them with traditional approaches of genetics and molecular biology. The advances in omics technologies, which includes genomics, transcriptomics, and proteomics can help achieve this end. This is especially relevant when studying non-model organisms. Here, we discuss the recent progress in the understanding of hybrid genetic incompatibility; omics technologies, and how together they have helped in characterizing protein complexes and in turn multi-locus incompatibilities. We also review advances in bioinformatic techniques suitable for this purpose and propose directions for leveraging the knowledge gained from model-organisms to identify genetic incompatibilities in non-model organisms.
Collapse
Affiliation(s)
- Krishna B. S. Swamy
- Division of Biological and Life Sciences, School of Arts and Sciences, Ahmedabad University, Ahmedabad, India
| | - Scott C. Schuyler
- Department of Biomedical Sciences, College of Medicine, Chang Gung University, Taoyuan, Taiwan
- Division of Head and Neck Surgery, Department of Otolaryngology, Chang Gung Memorial Hospital, Taoyuan, Taiwan
| | - Jun-Yi Leu
- Institute of Molecular Biology, Academia Sinica, Taipei, Taiwan
| |
Collapse
|
173
|
Gunnarsson S, Prabakaran S. In silico identification of novel open reading frames in Plasmodium falciparum oocyte and salivary gland sporozoites using proteogenomics framework. Malar J 2021; 20:71. [PMID: 33546698 PMCID: PMC7866754 DOI: 10.1186/s12936-021-03598-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2020] [Accepted: 01/16/2021] [Indexed: 11/25/2022] Open
Abstract
Background Plasmodium falciparum causes the deadliest form of malaria, which remains one of the most prevalent infectious diseases. Unfortunately, the only licensed vaccine showed limited protection and resistance to anti-malarial drug is increasing, which can be largely attributed to the biological complexity of the parasite’s life cycle. The progression from one developmental stage to another in P. falciparum involves drastic changes in gene expressions, where its infectivity to human hosts varies greatly depending on the stage. Approaches to identify candidate genes that are responsible for the development of infectivity to human hosts typically involve differential gene expression analysis between stages. However, the detection may be limited to annotated proteins and open reading frames (ORFs) predicted using restrictive criteria. Methods The above problem is particularly relevant for P. falciparum; whose genome annotation is relatively incomplete given its clinical significance. In this work, systems proteogenomics approach was used to address this challenge, as it allows computational detection of unannotated, novel Open Reading Frames (nORFs), which are neglected by conventional analyses. Two pairs of transcriptome/proteome were obtained from a previous study where one was collected in the mosquito-infectious oocyst sporozoite stage, and the other in the salivary gland sporozoite stage with human infectivity. They were then re-analysed using the proteogenomics framework to identify nORFs in each stage. Results Translational products of nORFs that map to antisense, intergenic, intronic, 3′ UTR and 5′ UTR regions, as well as alternative reading frames of canonical proteins were detected. Some of these nORFs also showed differential expression between the two life cycle stages studied. Their regulatory roles were explored through further bioinformatics analyses including the expression regulation on the parent reference genes, in silico structure prediction, and gene ontology term enrichment analysis. Conclusion The identification of nORFs in P. falciparum sporozoites highlights the biological complexity of the parasite. Although the analyses are solely computational, these results provide a starting point for further experimental validation of the existence and functional roles of these nORFs,
Collapse
Affiliation(s)
- Sophie Gunnarsson
- Department of Genetics, University of Cambridge, Downing Site, Cambridge, CB2 3EH, UK
| | - Sudhakaran Prabakaran
- Department of Genetics, University of Cambridge, Downing Site, Cambridge, CB2 3EH, UK.
| |
Collapse
|
174
|
Ferreira JA, Relvas-Santos M, Peixoto A, M N Silva A, Lara Santos L. Glycoproteogenomics: Setting the Course for Next-generation Cancer Neoantigen Discovery for Cancer Vaccines. GENOMICS, PROTEOMICS & BIOINFORMATICS 2021; 19:25-43. [PMID: 34118464 PMCID: PMC8498922 DOI: 10.1016/j.gpb.2021.03.005] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/11/2020] [Revised: 01/25/2021] [Accepted: 03/01/2021] [Indexed: 12/24/2022]
Abstract
Molecular-assisted precision oncology gained tremendous ground with high-throughput next-generation sequencing (NGS), supported by robust bioinformatics. The quest for genomics-based cancer medicine set the foundations for improved patient stratification, while unveiling a wide array of neoantigens for immunotherapy. Upfront pre-clinical and clinical studies have successfully used tumor-specific peptides in vaccines with minimal off-target effects. However, the low mutational burden presented by many lesions challenges the generalization of these solutions, requiring the diversification of neoantigen sources. Oncoproteogenomics utilizing customized databases for protein annotation by mass spectrometry (MS) is a powerful tool toward this end. Expanding the concept toward exploring proteoforms originated from post-translational modifications (PTMs) will be decisive to improve molecular subtyping and provide potentially targetable functional nodes with increased cancer specificity. Walking through the path of systems biology, we highlight that alterations in protein glycosylation at the cell surface not only have functional impact on cancer progression and dissemination but also originate unique molecular fingerprints for targeted therapeutics. Moreover, we discuss the outstanding challenges required to accommodate glycoproteomics in oncoproteogenomics platforms. We envisage that such rationale may flag a rather neglected research field, generating novel paradigms for precision oncology and immunotherapy.
Collapse
Affiliation(s)
- José Alexandre Ferreira
- Experimental Pathology and Therapeutics Group, Portuguese Institute of Oncology, Porto 4200-072, Portugal; Institute of Biomedical Sciences Abel Salazar, University of Porto, Porto 4050-313, Portugal; Porto Comprehensive Cancer Center (P.ccc), Porto 4200-072, Portugal.
| | - Marta Relvas-Santos
- Experimental Pathology and Therapeutics Group, Portuguese Institute of Oncology, Porto 4200-072, Portugal; Institute of Biomedical Sciences Abel Salazar, University of Porto, Porto 4050-313, Portugal; REQUIMTE-LAQV, Department of Chemistry and Biochemistry, Faculty of Sciences of the University of Porto, Porto 4169-007, Portugal
| | - Andreia Peixoto
- Experimental Pathology and Therapeutics Group, Portuguese Institute of Oncology, Porto 4200-072, Portugal; Institute of Biomedical Sciences Abel Salazar, University of Porto, Porto 4050-313, Portugal
| | - André M N Silva
- REQUIMTE-LAQV, Department of Chemistry and Biochemistry, Faculty of Sciences of the University of Porto, Porto 4169-007, Portugal
| | - Lúcio Lara Santos
- Experimental Pathology and Therapeutics Group, Portuguese Institute of Oncology, Porto 4200-072, Portugal; Institute of Biomedical Sciences Abel Salazar, University of Porto, Porto 4050-313, Portugal; Porto Comprehensive Cancer Center (P.ccc), Porto 4200-072, Portugal
| |
Collapse
|
175
|
Verma A, Halder A, Marathe S, Purwar R, Srivastava S. A proteogenomic approach to target neoantigens in solid tumors. Expert Rev Proteomics 2021; 17:797-812. [PMID: 33491499 DOI: 10.1080/14789450.2020.1881889] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
INTRODUCTION Proteogenomic techniques find applications in identifying novel cancer-specific peptides called neoantigens; they are non-self peptides derived from tumor-specific non-synonymous mutations. These peptides with MHCs are recognized by the T cells and induce an antitumor response. Due to their selective expression of tumor cells, neoantigens are considered attractive targets for cancer immunotherapy. AREAS COVERED In this review, we have discussed the proteogenomic strategies to identify neoantigens. We have also provided a neoantigen identification pipeline using data from whole-exome sequencing, RNA sequencing, and MHC peptidomics. Further, we have reviewed recent tools for neoantigen discovery. EXPERT COMMENTARY The limitations in instrument sensitivity and availability of bioinformatics tools have restricted the identification of neoantigens from tumor samples. Nonetheless, the recent improvement in genome sequencing, mass spectrometry technologies, and the development of reliable algorithms for epitope prediction provide hope for efficient identification of neoantigens. Translating this workflow on patient samples would represent a massive advancement in neoantigen identification methods, leading to the constitution of novel personalized neoantigen cancer vaccines.
Collapse
Affiliation(s)
- Ayushi Verma
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay , Mumbai, India
| | - Ankit Halder
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay , Mumbai, India
| | - Soumitra Marathe
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay , Mumbai, India
| | - Rahul Purwar
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay , Mumbai, India
| | - Sanjeeva Srivastava
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay , Mumbai, India
| |
Collapse
|
176
|
Gagnon M, Savard M, Jacques JF, Bkaily G, Geha S, Roucou X, Gobeil F. Potentiation of B2 receptor signaling by AltB2R, a newly identified alternative protein encoded in the human bradykinin B2 receptor gene. J Biol Chem 2021; 296:100329. [PMID: 33497625 PMCID: PMC7949122 DOI: 10.1016/j.jbc.2021.100329] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2020] [Revised: 01/12/2021] [Accepted: 01/21/2021] [Indexed: 12/27/2022] Open
Abstract
Recent functional and proteomic studies in eukaryotes (www.openprot.org) predict the translation of alternative open reading frames (AltORFs) in mature G-protein-coupled receptor (GPCR) mRNAs, including that of bradykinin B2 receptor (B2R). Our main objective was to determine the implication of a newly discovered AltORF resulting protein, termed AltB2R, in the known signaling properties of B2R using complementary methodological approaches. When ectopically expressed in HeLa cells, AltB2R presented predominant punctate cytoplasmic/perinuclear distribution and apparent cointeraction with B2R at plasma and endosomal/vesicular membranes. The presence of AltB2R increases intracellular [Ca2+] and ERK1/2-MAPK activation (via phosphorylation) following B2R stimulation. Moreover, HEK293A cells expressing mutant B2R lacking concomitant expression of AltB2R displayed significantly decreased maximal responses in agonist-stimulated Gαq-Gαi2/3-protein coupling, IP3 generation, and ERK1/2-MAPK activation as compared with wild-type controls. Conversely, there was no difference in cell-surface density as well as ligand-binding properties of B2R and in efficiencies of cognate agonists at promoting B2R internalization and β-arrestin 2 recruitment. Importantly, both AltB2R and B2R proteins were overexpressed in prostate and breast cancers, compared with their normal counterparts suggesting new associative roles of AltB2R in these diseases. Our study shows that BDKRB2 is a dual-coding gene and identifies AltB2R as a novel positive modulator of some B2R signaling pathways. More broadly, it also supports a new, unexpected alternative proteome for GPCRs, which opens new frontiers in fields of GPCR biology, diseases, and drug discovery.
Collapse
Affiliation(s)
- Maxime Gagnon
- Department of Biochemistry, Université de Sherbrooke, Sherbrooke, Québec, Canada; Institute of Pharmacology, Université de Sherbrooke, Sherbrooke, Québec, Canada
| | - Martin Savard
- Department of Pharmacology & Physiology, Université de Sherbrooke, Sherbrooke, Québec, Canada; Institute of Pharmacology, Université de Sherbrooke, Sherbrooke, Québec, Canada
| | - Jean-François Jacques
- Department of Pharmacology & Physiology, Université de Sherbrooke, Sherbrooke, Québec, Canada
| | - Ghassan Bkaily
- Department of Immunology & Cellular Biology, Université de Sherbrooke, Sherbrooke, Québec, Canada
| | - Sameh Geha
- Department of Pathology, Centre Hospitalier Universitaire de Sherbrooke, Sherbrooke, Québec, Canada
| | - Xavier Roucou
- Department of Biochemistry, Université de Sherbrooke, Sherbrooke, Québec, Canada; Institute of Pharmacology, Université de Sherbrooke, Sherbrooke, Québec, Canada.
| | - Fernand Gobeil
- Department of Pharmacology & Physiology, Université de Sherbrooke, Sherbrooke, Québec, Canada; Institute of Pharmacology, Université de Sherbrooke, Sherbrooke, Québec, Canada.
| |
Collapse
|
177
|
Colistin Dependence in Extensively Drug-Resistant Acinetobacter baumannii Strain Is Associated with IS Ajo2 and IS Aba13 Insertions and Multiple Cellular Responses. Int J Mol Sci 2021; 22:ijms22020576. [PMID: 33430070 PMCID: PMC7827689 DOI: 10.3390/ijms22020576] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Revised: 01/05/2021] [Accepted: 01/06/2021] [Indexed: 02/06/2023] Open
Abstract
The nosocomial opportunistic Gram-negative bacterial pathogen Acinetobacter baumannii is resistant to multiple antimicrobial agents and an emerging global health problem. The polymyxin antibiotic colistin, targeting the negatively charged lipid A component of the lipopolysaccharide on the bacterial cell surface, is often considered as the last-resort treatment, but resistance to colistin is unfortunately increasing worldwide. Notably, colistin-susceptible A. baumannii can also develop a colistin dependence after exposure to this drug in vitro. Colistin dependence might represent a stepping stone to resistance also in vivo. However, the mechanisms are far from clear. To address this issue, we combined proteogenomics, high-resolution microscopy, and lipid profiling to characterize and compare A. baumannii colistin-susceptible clinical isolate (Ab-S) of to its colistin-dependent subpopulation (Ab-D) obtained after subsequent passages in moderate colistin concentrations. Incidentally, in the colistin-dependent subpopulation the lpxA gene was disrupted by insertion of ISAjo2, the lipid A biosynthesis terminated, and Ab-D cells displayed a lipooligosaccharide (LOS)-deficient phenotype. Moreover, both mlaD and pldA genes were perturbed by insertions of ISAjo2 and ISAba13, and LOS-deficient bacteria displayed a capsule with decreased thickness as well as other surface imperfections. The major changes in relative protein abundance levels were detected in type 6 secretion system (T6SS) components, the resistance-nodulation-division (RND)-type efflux pumps, and in proteins involved in maintenance of outer membrane asymmetry. These findings suggest that colistin dependence in A. baumannii involves an ensemble of mechanisms seen in resistance development and accompanied by complex cellular events related to insertional sequences (ISs)-triggered LOS-deficiency. To our knowledge, this is the first study demonstrating the involvement of ISAjo2 and ISAba13 IS elements in the modulation of the lipid A biosynthesis and associated development of dependence on colistin.
Collapse
|
178
|
Fabre B, Combier JP, Plaza S. Recent advances in mass spectrometry-based peptidomics workflows to identify short-open-reading-frame-encoded peptides and explore their functions. Curr Opin Chem Biol 2021; 60:122-130. [PMID: 33401134 DOI: 10.1016/j.cbpa.2020.12.002] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Revised: 11/26/2020] [Accepted: 12/03/2020] [Indexed: 12/12/2022]
Abstract
Short open reading frame (sORF)-encoded polypeptides (SEPs) have recently emerged as key regulators of major cellular processes. Computational methods for the annotation of sORFs combined with transcriptomics and ribosome profiling approaches predicted the existence of tens of thousands of SEPs across the kingdom of life. Although, we still lack unambiguous evidence for most of them. The method of choice to validate the expression of SEPs is mass spectrometry (MS)-based peptidomics. Peptides are less abundant than proteins, which tends to hinder their detection. Therefore, optimization and enrichment methods are necessary to validate the existence of SEPs. In this article, we discuss the challenges for the detection of SEPs by MS and recent developments of biochemical approaches applied to the study of these peptides. We detail the advances made in the different key steps of a typical peptidomics workflow and highlight possible alternatives that have not been explored yet.
Collapse
Affiliation(s)
- Bertrand Fabre
- Laboratoire de Recherche en Sciences Végétales, UMR5546, Université de Toulouse, UPS, CNRS, 31320, Auzeville-Tolosane, France.
| | - Jean-Philippe Combier
- Laboratoire de Recherche en Sciences Végétales, UMR5546, Université de Toulouse, UPS, CNRS, 31320, Auzeville-Tolosane, France
| | - Serge Plaza
- Laboratoire de Recherche en Sciences Végétales, UMR5546, Université de Toulouse, UPS, CNRS, 31320, Auzeville-Tolosane, France
| |
Collapse
|
179
|
Diz AP, Sánchez-Marín P. A Primer and Guidelines for Shotgun Proteomic Analysis in Non-model Organisms. Methods Mol Biol 2021; 2259:77-102. [PMID: 33687710 DOI: 10.1007/978-1-0716-1178-4_6] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
During the last decade, we have witnessed outstanding advances in proteomics led mostly by great technological improvements in mass spectrometry field allowing high-throughput production of high-quality data used for massive protein identification and quantification. From a practical viewpoint, these advances have been mainly exploited in research projects involving model organisms with abundant genomic and proteomic information available in public databases. However, there is a growing number of organisms of high interest in different disciplines, such as ecological, biotechnological, and evolutionary research, yet poorly represented in these databases. Important advances in massive parallel sequencing technology and easy accessibility of this technology to many research laboratories have made nowadays possible to produce customized genomic and proteomic databases of any organism. Along this line, the use of proteogenomic approaches by combining in the same analysis the data obtained from different omic levels has emerged as a very useful and powerful strategy to run shotgun proteomic experiments specially focused on non-model organisms. In this chapter, we provide detailed procedures to undertake shotgun quantitative proteomic experiments following either a label-free or an isobaric labeling approach in non-model organisms, emphasizing also a few key aspects related to experimental design and data analysis.
Collapse
Affiliation(s)
- Angel P Diz
- Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo, Spain. .,Marine Research Center, University of Vigo (CIM-UVIGO), Vigo, Spain.
| | - Paula Sánchez-Marín
- Centro Oceanográfico de Vigo, Instituto Español de Oceanografía, Vigo, Spain
| |
Collapse
|
180
|
Jorge GL, Balbuena TS. Identification of novel protein-coding sequences in Eucalyptus grandis plants by high-resolution mass spectrometry. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2020; 1869:140594. [PMID: 33385527 DOI: 10.1016/j.bbapap.2020.140594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Received: 10/27/2020] [Revised: 12/11/2020] [Accepted: 12/23/2020] [Indexed: 10/22/2022]
Abstract
Eucalyptus species are widely used in the forestry industry, and a significant increase in the number of sequences available in database repositories has been observed for these species. In proteomics, a protein is identified by correlating the theoretical fragmentation spectrum derived from genomic/transcriptomic data against the experimental fragmentation mass spectrum acquired from large-scale analysis of protein mixtures. Proteogenomics is an alternative approach that can identify novel proteins encoded by regions previously considered as non-coding. This study aimed to confidently identify and confirm the existence of previously unknown protein-coding sequences in the Eucalyptus grandis genome. To this end, we used a modified spectral correlation strategy and a dedicated de novo peptide sequencing pipeline. Upon the strategy used here, we confidently identified 41 novel peptide forms and six peptides containing at least one single amino acid substitution. The most representative genomic class of novel peptides was identified as originating from alternative reading frames. In contrast, no clear single amino acid substitution pattern was identified. Validation of the identifications was carried out using a parallel reaction monitoring approach that provided further mass spectrometry support for the existence of the novel peptide sequences. Data are available via ProteomeXchange with identifier PXD022110.
Collapse
Affiliation(s)
- Gabriel Lemes Jorge
- Sao Paulo State University, Department of Technology, Jaboticabal, Sao Paulo, Brazil.
| | | |
Collapse
|
181
|
Tariq MU, Haseeb M, Aledhari M, Razzak R, Parizi RM, Saeed F. Methods for Proteogenomics Data Analysis, Challenges, and Scalability Bottlenecks: A Survey. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2020; 9:5497-5516. [PMID: 33537181 PMCID: PMC7853650 DOI: 10.1109/access.2020.3047588] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
Big Data Proteogenomics lies at the intersection of high-throughput Mass Spectrometry (MS) based proteomics and Next Generation Sequencing based genomics. The combined and integrated analysis of these two high-throughput technologies can help discover novel proteins using genomic, and transcriptomic data. Due to the biological significance of integrated analysis, the recent past has seen an influx of proteogenomic tools that perform various tasks, including mapping proteins to the genomic data, searching experimental MS spectra against a six-frame translation genome database, and automating the process of annotating genome sequences. To date, most of such tools have not focused on scalability issues that are inherent in proteogenomic data analysis where the size of the database is much larger than a typical protein database. These state-of-the-art tools can take more than half a month to process a small-scale dataset of one million spectra against a genome of 3 GB. In this article, we provide an up-to-date review of tools that can analyze proteogenomic datasets, providing a critical analysis of the techniques' relative merits and potential pitfalls. We also point out potential bottlenecks and recommendations that can be incorporated in the future design of these workflows to ensure scalability with the increasing size of proteogenomic data. Lastly, we make a case of how high-performance computing (HPC) solutions may be the best bet to ensure the scalability of future big data proteogenomic data analysis.
Collapse
Affiliation(s)
- Muhammad Usman Tariq
- School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
| | - Muhammad Haseeb
- School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
| | - Mohammed Aledhari
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA
| | - Rehma Razzak
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA
| | - Reza M Parizi
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA
| | - Fahad Saeed
- School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
| |
Collapse
|
182
|
Yang M, Zhu Z, Zhuang Z, Bai Y, Wang S, Ge F. Proteogenomic Characterization of the Pathogenic Fungus Aspergillus flavus Reveals Novel Genes Involved in Aflatoxin Production. Mol Cell Proteomics 2020; 20:100013. [PMID: 33568340 PMCID: PMC7950108 DOI: 10.1074/mcp.ra120.002144] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2020] [Revised: 10/06/2020] [Accepted: 11/24/2020] [Indexed: 12/20/2022] Open
Abstract
Aspergillus flavus (A. flavus), a pathogenic fungus, can produce carcinogenic and toxic aflatoxins that are a serious agricultural and medical threat worldwide. Attempts to decipher the aflatoxin biosynthetic pathway have been hampered by the lack of a high-quality genome annotation for A. flavus. To address this gap, we performed a comprehensive proteogenomic analysis using high-accuracy mass spectrometry data for this pathogen. The resulting high-quality data set confirmed the translation of 8724 previously predicted genes and identified 732 novel proteins, 269 splice variants, 447 single amino acid variants, 188 revised genes. A subset of novel proteins was experimentally validated by RT-PCR and synthetic peptides. Further functional annotation suggested that a number of the identified novel proteins may play roles in aflatoxin biosynthesis and stress responses in A. flavus. This comprehensive strategy also identified a wide range of posttranslational modifications (PTMs), including 3461 modification sites from 1765 proteins. Functional analysis suggested the involvement of these modified proteins in the regulation of cellular metabolic and aflatoxin biosynthetic pathways. Together, we provided a high-quality annotation of A. flavus genome and revealed novel insights into the mechanisms of aflatoxin production and pathogenicity in this pathogen.
Collapse
Affiliation(s)
- Mingkun Yang
- School of Life Sciences, and Key Laboratory of Pathogenic Fungi and Mycotoxins of Fujian Province, Fujian Agriculture and Forestry University, Fuzhou, China; State Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, China
| | - Zhuo Zhu
- School of Life Sciences, and Key Laboratory of Pathogenic Fungi and Mycotoxins of Fujian Province, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Zhenhong Zhuang
- School of Life Sciences, and Key Laboratory of Pathogenic Fungi and Mycotoxins of Fujian Province, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Youhuang Bai
- School of Life Sciences, and Key Laboratory of Pathogenic Fungi and Mycotoxins of Fujian Province, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Shihua Wang
- School of Life Sciences, and Key Laboratory of Pathogenic Fungi and Mycotoxins of Fujian Province, Fujian Agriculture and Forestry University, Fuzhou, China.
| | - Feng Ge
- State Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, China.
| |
Collapse
|
183
|
Krassowski M, Das V, Sahu SK, Misra BB. State of the Field in Multi-Omics Research: From Computational Needs to Data Mining and Sharing. Front Genet 2020; 11:610798. [PMID: 33362867 PMCID: PMC7758509 DOI: 10.3389/fgene.2020.610798] [Citation(s) in RCA: 139] [Impact Index Per Article: 34.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2020] [Accepted: 11/20/2020] [Indexed: 12/24/2022] Open
Abstract
Multi-omics, variously called integrated omics, pan-omics, and trans-omics, aims to combine two or more omics data sets to aid in data analysis, visualization and interpretation to determine the mechanism of a biological process. Multi-omics efforts have taken center stage in biomedical research leading to the development of new insights into biological events and processes. However, the mushrooming of a myriad of tools, datasets, and approaches tends to inundate the literature and overwhelm researchers new to the field. The aims of this review are to provide an overview of the current state of the field, inform on available reliable resources, discuss the application of statistics and machine/deep learning in multi-omics analyses, discuss findable, accessible, interoperable, reusable (FAIR) research, and point to best practices in benchmarking. Thus, we provide guidance to interested users of the domain by addressing challenges of the underlying biology, giving an overview of the available toolset, addressing common pitfalls, and acknowledging current methods' limitations. We conclude with practical advice and recommendations on software engineering and reproducibility practices to share a comprehensive awareness with new researchers in multi-omics for end-to-end workflow.
Collapse
Affiliation(s)
- Michal Krassowski
- Nuffield Department of Women’s & Reproductive Health, University of Oxford, Oxford, United Kingdom
| | - Vivek Das
- Novo Nordisk Research Center Seattle, Inc, Seattle, WA, United States
| | | | | |
Collapse
|
184
|
Casimiro-Soriguer CS, Rigual MM, Brokate-Llanos AM, Muñoz MJ, Garzón A, Pérez-Pulido AJ, Jimenez J. Using AnABlast for intergenic sORF prediction in the Caenorhabditis elegans genome. Bioinformatics 2020; 36:4827-4832. [PMID: 32614398 PMCID: PMC7723330 DOI: 10.1093/bioinformatics/btaa608] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2020] [Revised: 06/21/2020] [Accepted: 06/23/2020] [Indexed: 11/29/2022] Open
Abstract
Motivation Short bioactive peptides encoded by small open reading frames (sORFs) play important roles in eukaryotes. Bioinformatics prediction of ORFs is an early step in a genome sequence analysis, but sORFs encoding short peptides, often using non-AUG initiation codons, are not easily discriminated from false ORFs occurring by chance. Results AnABlast is a computational tool designed to highlight putative protein-coding regions in genomic DNA sequences. This protein-coding finder is independent of ORF length and reading frame shifts, thus making of AnABlast a potentially useful tool to predict sORFs. Using this algorithm, here, we report the identification of 82 putative new intergenic sORFs in the Caenorhabditis elegans genome. Sequence similarity, motif presence, expression data and RNA interference experiments support that the underlined sORFs likely encode functional peptides, encouraging the use of AnABlast as a new approach for the accurate prediction of intergenic sORFs in annotated eukaryotic genomes. Availability and implementation AnABlast is freely available at http://www.bioinfocabd.upo.es/ab/. The C.elegans genome browser with AnABlast results, annotated genes and all data used in this study is available at http://www.bioinfocabd.upo.es/celegans. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- C S Casimiro-Soriguer
- Centro Andaluz de Biología del Desarrollo (CABD, UPO-CSIC), Universidad Pablo de Olavide, 41013 Sevilla, Spain
| | - M M Rigual
- Centro Andaluz de Biología del Desarrollo (CABD, UPO-CSIC), Universidad Pablo de Olavide, 41013 Sevilla, Spain
| | - A M Brokate-Llanos
- Centro Andaluz de Biología del Desarrollo (CABD, UPO-CSIC), Universidad Pablo de Olavide, 41013 Sevilla, Spain
| | - M J Muñoz
- Centro Andaluz de Biología del Desarrollo (CABD, UPO-CSIC), Universidad Pablo de Olavide, 41013 Sevilla, Spain
| | - A Garzón
- Centro Andaluz de Biología del Desarrollo (CABD, UPO-CSIC), Universidad Pablo de Olavide, 41013 Sevilla, Spain
| | - A J Pérez-Pulido
- Centro Andaluz de Biología del Desarrollo (CABD, UPO-CSIC), Universidad Pablo de Olavide, 41013 Sevilla, Spain
| | - J Jimenez
- Centro Andaluz de Biología del Desarrollo (CABD, UPO-CSIC), Universidad Pablo de Olavide, 41013 Sevilla, Spain
| |
Collapse
|
185
|
Mullins Y, Keogh K, Blackshields G, Kenny DA, Kelly AK, Waters SM. Transcriptome assisted label free proteomics of hepatic tissue in response to both dietary restriction and compensatory growth in cattle. J Proteomics 2020; 232:104048. [PMID: 33217582 DOI: 10.1016/j.jprot.2020.104048] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2020] [Revised: 10/15/2020] [Accepted: 11/10/2020] [Indexed: 11/28/2022]
Abstract
Compensatory growth (CG) is a naturally occurring phenomenon where, following a period of under nutrition, an animal exhibits accelerated growth upon re-alimentation. The objective was to identify and quantify hepatic proteins involved in the regulation of CG in cattle. Forty Holstein Friesian bulls were equally assigned to one of four groups. Groups; A1 and A2 had ad libitum access to feed for 125 days, groups R1 and R2 were feed restricted. Following this, R1 and A1 animals were slaughtered. Remaining animals (R2 and A2) were slaughtered following ad libitum feeding for a successive 55 days. At slaughter hepatic tissue samples were collected and label-free quantitative proteomics undertaken with spectra searched against a custom built transcriptome database specific to the animals in this study. 24 differentially abundant proteins were identified during CG (R2 vs. R1) including; PSPH, ASNS and GSTM1, which are involved in nutrient metabolism, immune response and cellular growth. Proteins involved in biochemical pathways related to nutrient metabolism were down-regulated during CG, indicating a possible adaptive response by the liver to a period of fluctuating nutrient availability. The livers ability to regulate its metabolic activity may have profound effects on the efficiency of whole body energy utilization during CG. SIGNIFICANCE: This study is the first to unravel the effect of compensatory growth on the hepatic proteome of cattle using transcriptome-assisted shot gun proteomics. Proteins identified as being affected by dietary restriction and subsequent expression of compensatory growth in this study may, following appropriate validation, contribute to the identification of functional genetic variants. Such information could be harnessed within the context of genomic selection in cattle breeding programs to identify animals with a greater genetic potential to undergo compensatory growth, thus increasing the profitability of the beef sector and accelerating genetic gain.
Collapse
Affiliation(s)
- Yvonne Mullins
- Animal and Bioscience Research Department, Animal and Grassland Research and Innovation Centre, Teagasc, Grange, Dunsany, Co. Meath, Ireland; School of Agriculture and Food Science, University College Dublin, Belfield, Dublin 4, Ireland.
| | - Kate Keogh
- Animal and Bioscience Research Department, Animal and Grassland Research and Innovation Centre, Teagasc, Grange, Dunsany, Co. Meath, Ireland
| | - Gordon Blackshields
- Animal and Bioscience Research Department, Animal and Grassland Research and Innovation Centre, Teagasc, Grange, Dunsany, Co. Meath, Ireland
| | - David A Kenny
- Animal and Bioscience Research Department, Animal and Grassland Research and Innovation Centre, Teagasc, Grange, Dunsany, Co. Meath, Ireland
| | - Alan K Kelly
- School of Agriculture and Food Science, University College Dublin, Belfield, Dublin 4, Ireland
| | - Sinéad M Waters
- Animal and Bioscience Research Department, Animal and Grassland Research and Innovation Centre, Teagasc, Grange, Dunsany, Co. Meath, Ireland
| |
Collapse
|
186
|
Chen W, Liu X. Proteoform Identification by Combining RNA-Seq and Top-Down Mass Spectrometry. J Proteome Res 2020; 20:261-269. [PMID: 33183009 DOI: 10.1021/acs.jproteome.0c00369] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
In proteogenomic studies, genomic and transcriptomic variants are incorporated into customized protein databases for the identification of proteoforms, especially proteoforms with sample-specific variants. Most proteogenomic research has been focused on combining genomic or transcriptomic data with bottom-up mass spectrometry data. In the last decade, top-down mass spectrometry has attracted increasing attention because of its capacity to identify various proteoforms with alterations. However, top-down proteogenomics, in which genomic or transcriptomic data are combined with top-down mass spectrometry data, has not been widely adopted, and there is still a lack of software tools for top-down proteogenomic data analysis. In this paper, we introduce TopPG, a proteogenomic tool for generating proteoform sequence databases with genetic alterations and alternative splicing events. Experiments on top-down proteogenomic data of DLD-1 colorectal cancer cells showed that TopPG coupled with database search confidently identified proteoforms with sample-specific alterations.
Collapse
Affiliation(s)
- Wenrong Chen
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, Indiana 46202, United States
| | - Xiaowen Liu
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, Indiana 46202, United States.,Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana 46202, United States
| |
Collapse
|
187
|
Dagamajalu S, Vijayakumar M, Shetty R, Rex DAB, Narayana Kotimoole C, Prasad TSK. Proteogenomic examination of esophageal squamous cell carcinoma (ESCC): new lines of inquiry. Expert Rev Proteomics 2020; 17:649-662. [PMID: 33151123 DOI: 10.1080/14789450.2020.1845146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Introduction: Esophageal squamous cell carcinoma (ESCC), a histopathologic subtype of esophageal cancer is a major cause of cancer-related morbidity and mortality worldwide. This is primarily because patients are diagnosed at an advanced stage by the time symptoms appear. The genomics and mass spectrometry-based proteomics continue to provide important leads toward biomarker discovery for ESCC. However, such leads are yet to be translated into clinical utilities. Areas covered: We gathered information pertaining to proteomics and proteogenomics efforts in ESCC from the literature search until 2020. An overview of omics approaches to discover the candidate biomarkers for ESCC were highlighted. We present a summary of recent investigations of alterations in the level of gene and protein expression observed in biological samples including body fluids, tissue/biopsy and in vitro-based models. Expert opinion: A large number of protein-based biomarkers and therapeutic targets are being used in cancer therapy. Several candidates are being developed as diagnostics and prognostics for the management of cancers. High-resolution proteomic and proteogenomic approaches offer an efficient way to identify additional candidate biomarkers for diagnosis, monitoring of disease progression, prediction of response to chemo and radiotherapy. Some of these biomarkers can also be developed as therapeutic targets.
Collapse
Affiliation(s)
- Shobha Dagamajalu
- Center for Systems Biology and Molecular Medicine, Yenepoya Research Centre, Yenepoya (Deemed to Be University) , Mangalore, India
| | - Manavalan Vijayakumar
- Department of Surgical Oncology, Yenepoya Medical College, Yenepoya (Deemed to Be University) , Mangalore, India
| | - Rohan Shetty
- Department of Surgical Oncology, Yenepoya Medical College, Yenepoya (Deemed to Be University) , Mangalore, India
| | - D A B Rex
- Center for Systems Biology and Molecular Medicine, Yenepoya Research Centre, Yenepoya (Deemed to Be University) , Mangalore, India
| | - Chinmaya Narayana Kotimoole
- Center for Systems Biology and Molecular Medicine, Yenepoya Research Centre, Yenepoya (Deemed to Be University) , Mangalore, India
| | - T S Keshava Prasad
- Center for Systems Biology and Molecular Medicine, Yenepoya Research Centre, Yenepoya (Deemed to Be University) , Mangalore, India
| |
Collapse
|
188
|
Koşaloğlu-Yalçın Z, Sidney J, Chronister W, Peters B, Sette A. Comparison of HLA ligand elution data and binding predictions reveals varying prediction performance for the multiple motifs recognized by HLA-DQ2.5. Immunology 2020; 162:235-247. [PMID: 33064841 PMCID: PMC7808151 DOI: 10.1111/imm.13279] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2020] [Revised: 10/06/2020] [Accepted: 10/07/2020] [Indexed: 12/02/2022] Open
Abstract
Binding prediction tools are commonly used to identify peptides presented on MHC class II molecules. Recently, a wealth of data in the form of naturally eluted ligands has become available and discrepancies between ligand elution data and binding predictions have been reported. Quantitative metrics for such comparisons are currently lacking. In this study, we assessed how efficiently MHC class II binding predictions can identify naturally eluted peptides, and investigated instances with discrepancies between the two methods in detail. We found that, in general, MHC class II eluted ligands are predicted to bind to their reported restriction element with high affinity. But, for several studies reporting an increased number of ligands that were not predicted to bind, we found that the reported MHC restriction was ambiguous. Additional analyses determined that most of the ligands predicted to not bind, are predicted to bind other co‐expressed MHC class II molecules. For selected alleles, we addressed discrepancies between elution data and binding predictions by experimental measurements and found that predicted and measured affinities correlate well. For DQA1*05:01/DQB1*02:01 (DQ2.5) however, binding predictions did miss several peptides that were determined experimentally to be binders. For these peptides and several known DQ2.5 binders, we determined key residues for conferring DQ2.5 binding capacity, which revealed that DQ2.5 utilizes two different binding motifs, of which only one is predicted effectively. These findings have important implications for the interpretation of ligand elution data and for the improvement of MHC class II binding predictions.
Collapse
Affiliation(s)
| | - John Sidney
- La Jolla Institute for Immunology, La Jolla, CA, USA
| | | | - Bjoern Peters
- La Jolla Institute for Immunology, La Jolla, CA, USA.,Department of Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Alessandro Sette
- La Jolla Institute for Immunology, La Jolla, CA, USA.,Department of Medicine, University of California, San Diego, La Jolla, CA, USA
| |
Collapse
|
189
|
Proteomics in thyroid cancer and other thyroid-related diseases: A review of the literature. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2020; 1868:140510. [DOI: 10.1016/j.bbapap.2020.140510] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/27/2020] [Revised: 06/26/2020] [Accepted: 07/19/2020] [Indexed: 12/21/2022]
|
190
|
Wang L, Liu K, Li S, Tang H. A Fast and Memory-Efficient Spectral Library Search Algorithm Using Locality-Sensitive Hashing. Proteomics 2020; 20:e2000002. [PMID: 32415809 PMCID: PMC7669687 DOI: 10.1002/pmic.202000002] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2020] [Revised: 04/17/2020] [Indexed: 01/07/2023]
Abstract
With the accumulation of MS/MS spectra collected in spectral libraries, the spectral library searching approach emerges as an important approach for peptide identification in proteomics, complementary to the commonly used protein database searching approach, in particular for the proteomic analyses of well-studied model organisms, such as human. Existing spectral library searching algorithms compare a query MS/MS spectrum with each spectrum in the library with matched precursor mass and charge state, which may become computationally intensive with the rapidly growing library size. Here, the software msSLASH, which implements a fast spectral library searching algorithm based on the Locality-Sensitive Hashing (LSH) technique, is presented. The algorithm first converts the library and query spectra into bit-strings using LSH functions, and then computes the similarity between the spectra with highly similar bit-string. Using the spectral library searching of large real-world MS/MS spectra datasets, it is demonstrated that the algorithm significantly reduced the number of spectral comparisons, and as a result, achieved 2-9X speedup in comparison with existing spectral library searching algorithm SpectraST. The spectral searching algorithm is implemented in C/C++, and is ready to be used in proteomic data analyses.
Collapse
Affiliation(s)
- Lei Wang
- School of Informatics and Computing, Indiana University, Bloomington, IN, 47405, USA
| | - Kaiyuan Liu
- School of Informatics and Computing, Indiana University, Bloomington, IN, 47405, USA
| | - Sujun Li
- School of Informatics and Computing, Indiana University, Bloomington, IN, 47405, USA
| | - Haixu Tang
- School of Informatics and Computing, Indiana University, Bloomington, IN, 47405, USA
| |
Collapse
|
191
|
Bo C, Geng X, Zhang J, Sai L, Zhang Y, Yu G, Zhang Z, Liu K, Du Z, Peng C, Jia Q, Shao H. Comparative proteomic analysis of silica-induced pulmonary fibrosis in rats based on tandem mass tag (TMT) quantitation technology. PLoS One 2020; 15:e0241310. [PMID: 33119648 PMCID: PMC7595299 DOI: 10.1371/journal.pone.0241310] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2020] [Accepted: 10/12/2020] [Indexed: 12/30/2022] Open
Abstract
Silicosis is a systemic disease characterized by chronic persistent inflammation and incurable pulmonary fibrosis with the underlying molecular mechanisms to be fully elucidated. In this study, we employed tandem mass tag (TMT) based on quantitative proteomics technology to detect differentially expressed proteins (DEPs) in lung tissues of silica-exposed rats. A total of 285 DEPs (145 upregulated and 140 downregulated) were identified. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses were performed to predict the biological pathway and functional classification of the proteins. Results showed that these DEPs were mainly enriched in the phagosome, lysosome function, complement and the coagulation cascade, glutathione metabolism, focal adhesion and ECM-receptor interactions. To validate the proteomics data, we selected and analyzed the expression trends of six proteins including CD14, PSAP, GM2A, COL1A1, ITGA8 and CLDN5 using parallel reaction monitoring (PRM). The consistent result between PRM and TMT indicated the reliability of our proteomic data. These findings will help to reveal the pathogenesis of silicosis and provide potential therapeutic targets. Data are available via ProteomeXchange with identifier PXD020625.
Collapse
Affiliation(s)
- Cunxiang Bo
- Shandong Academy of Occupational Health and Occupational Medicine, Shandong First Medical University & Shandong Academy of Medical Sciences, Ji’nan, Shandong, China
| | - Xiao Geng
- Shandong Academy of Occupational Health and Occupational Medicine, Shandong First Medical University & Shandong Academy of Medical Sciences, Ji’nan, Shandong, China
| | - Juan Zhang
- Shandong Academy of Occupational Health and Occupational Medicine, Shandong First Medical University & Shandong Academy of Medical Sciences, Ji’nan, Shandong, China
| | - Linlin Sai
- Shandong Academy of Occupational Health and Occupational Medicine, Shandong First Medical University & Shandong Academy of Medical Sciences, Ji’nan, Shandong, China
| | - Yu Zhang
- Shandong Academy of Occupational Health and Occupational Medicine, Shandong First Medical University & Shandong Academy of Medical Sciences, Ji’nan, Shandong, China
| | - Gongchang Yu
- Shandong Academy of Occupational Health and Occupational Medicine, Shandong First Medical University & Shandong Academy of Medical Sciences, Ji’nan, Shandong, China
| | - Zhenling Zhang
- Shandong Academy of Occupational Health and Occupational Medicine, Shandong First Medical University & Shandong Academy of Medical Sciences, Ji’nan, Shandong, China
| | - Kai Liu
- Department of Cardiovascular Surgery, Qilu Hospital of Shandong University, Ji’nan, Shandong, China
| | - Zhongjun Du
- Shandong Academy of Occupational Health and Occupational Medicine, Shandong First Medical University & Shandong Academy of Medical Sciences, Ji’nan, Shandong, China
| | - Cheng Peng
- Queensland Alliance for Environmental Health Sciences, The University of Queensland, Brisbane, Queensland, Australia
| | - Qiang Jia
- Shandong Academy of Occupational Health and Occupational Medicine, Shandong First Medical University & Shandong Academy of Medical Sciences, Ji’nan, Shandong, China
- * E-mail: (QJ); (HS)
| | - Hua Shao
- Shandong Academy of Occupational Health and Occupational Medicine, Shandong First Medical University & Shandong Academy of Medical Sciences, Ji’nan, Shandong, China
- * E-mail: (QJ); (HS)
| |
Collapse
|
192
|
Lost and Found: Re-searching and Re-scoring Proteomics Data Aids Genome Annotation and Improves Proteome Coverage. mSystems 2020; 5:5/5/e00833-20. [PMID: 33109751 PMCID: PMC7593589 DOI: 10.1128/msystems.00833-20] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
Delineation of open reading frames (ORFs) causes persistent inconsistencies in prokaryote genome annotation. We demonstrate that by advanced (re)analysis of omics data, a higher proteome coverage and sensitive detection of unannotated ORFs can be achieved, which can be exploited for conditional bacterial genome (re)annotation, which is especially relevant in view of annotating the wealth of sequenced prokaryotic genomes obtained in recent years. Prokaryotic genome annotation is heavily dependent on automated gene annotation pipelines that are prone to propagate errors and underestimate genome complexity. We describe an optimized proteogenomic workflow that uses ribosome profiling (ribo-seq) and proteomic data for Salmonella enterica serovar Typhimurium to identify unannotated proteins or alternative protein forms. This data analysis encompasses the searching of cofragmenting peptides and postprocessing with extended peptide-to-spectrum quality features, including comparison to predicted fragment ion intensities. When this strategy is applied, an enhanced proteome depth is achieved, as well as greater confidence for unannotated peptide hits. We demonstrate the general applicability of our pipeline by reanalyzing public Deinococcus radiodurans data sets. Taken together, our results show that systematic reanalysis using available prokaryotic (proteome) data sets holds great promise to assist in experimentally based genome annotation. IMPORTANCE Delineation of open reading frames (ORFs) causes persistent inconsistencies in prokaryote genome annotation. We demonstrate that by advanced (re)analysis of omics data, a higher proteome coverage and sensitive detection of unannotated ORFs can be achieved, which can be exploited for conditional bacterial genome (re)annotation, which is especially relevant in view of annotating the wealth of sequenced prokaryotic genomes obtained in recent years.
Collapse
|
193
|
Comparison of different variant sequence types coupled with decoy generation methods used in concatenated target-decoy database searches for proteogenomic research. J Proteomics 2020; 231:104021. [PMID: 33148401 DOI: 10.1016/j.jprot.2020.104021] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Revised: 09/29/2020] [Accepted: 10/15/2020] [Indexed: 12/21/2022]
Abstract
Concatenated target-decoy database searches are commonly used in proteogenomic research for variant peptide identification. Currently, protein-based and peptide-based sequence databases are applied to store variant sequences for database searches. The protein-based database records a full-length wild-type protein sequence but using the given variant events to replace the original amino acids, whereas the peptide-based database retains only the in silico digested peptides containing the variants. However, the performance of applying various decoy generation methods on the peptide-based variant sequence database is still unclear, compared to the protein-based database. In this paper, we conduct a thorough comparison on target-decoy databases constructed by the above two types of databases coupled with various decoy generation methods for proteogenomic analyses. The results show that for the protein-based variant sequence database, using the reverse or the pseudo reverse method achieves similar performance for variant peptide identification. Furthermore, for the peptide-based database, the pseudo reverse method is more suitable than the widely used reverse method, as shown by identifying 6% more variant PSMs in a HEK293 cell line data set. SIGNIFICANCE: In our survey of publications on proteogenomic studies, 57% of the studies adopt the peptide-based variant sequence database coupled with the reverse method for decoy generation to construct a target-decoy database for searches. However, our results show that when using the peptide-based variant sequence database, it is better to adopt the pseudo reverse method for generating decoy sequences, to avoid leading to fewer variant peptides being identified.
Collapse
|
194
|
Waylen LN, Nim HT, Martelotto LG, Ramialison M. From whole-mount to single-cell spatial assessment of gene expression in 3D. Commun Biol 2020; 3:602. [PMID: 33097816 PMCID: PMC7584572 DOI: 10.1038/s42003-020-01341-1] [Citation(s) in RCA: 67] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2020] [Accepted: 09/10/2020] [Indexed: 12/31/2022] Open
Abstract
Unravelling spatio-temporal patterns of gene expression is crucial to understanding core biological principles from embryogenesis to disease. Here we review emerging technologies, providing automated, high-throughput, spatially resolved quantitative gene expression data. Novel techniques expand on current benchmark protocols, expediting their incorporation into ongoing research. These approaches digitally reconstruct patterns of embryonic expression in three dimensions, and have successfully identified novel domains of expression, cell types, and tissue features. Such technologies pave the way for unbiased and exhaustive recapitulation of gene expression levels in spatial and quantitative terms, promoting understanding of the molecular origin of developmental defects, and improving medical diagnostics.
Collapse
Affiliation(s)
- Lisa N Waylen
- Australian Regenerative Medicine Institute and Systems Biology Institute, Monash University, Clayton, VIC, Australia
| | - Hieu T Nim
- Australian Regenerative Medicine Institute and Systems Biology Institute, Monash University, Clayton, VIC, Australia
- Transcriptomics and Bioinformatics Group, Murdoch Children's Research Institute, Parkville, VIC, Australia
| | - Luciano G Martelotto
- Single Cell Core Laboratory, Harvard Medical School, Department of System Biology, Boston, MA, USA
| | - Mirana Ramialison
- Australian Regenerative Medicine Institute and Systems Biology Institute, Monash University, Clayton, VIC, Australia.
- Transcriptomics and Bioinformatics Group, Murdoch Children's Research Institute, Parkville, VIC, Australia.
| |
Collapse
|
195
|
Han Y, Wright JM, Lau E, Lam MPY. Determining Alternative Protein Isoform Expression Using RNA Sequencing and Mass Spectrometry. STAR Protoc 2020; 1:100138. [PMID: 33377032 PMCID: PMC7757315 DOI: 10.1016/j.xpro.2020.100138] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Alternative splicing greatly expands the coding capacity of the human genome, but how many alternative transcripts are translated as proteins or carry functional importance remains unknown and awaits experimental verification. Here, we describe a protocol that combines transcriptomics (RNA-seq) and proteomics (mass spectrometry [MS]) analyses to identify alternative isoforms in proteomes. This workflow is applicable to custom-generated RNA-seq and MS data from matching samples, as well as the reanalysis of existing transcriptomics and proteomics datasets in public repositories. For complete details on the use and execution of this protocol, please refer to Lau et al. (2019).
Collapse
Affiliation(s)
- Yu Han
- Department of Medicine-Cardiology, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA.,Consortium for Fibrosis Research & Translation, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Julianna M Wright
- Department of Medicine-Cardiology, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Edward Lau
- Department of Medicine-Cardiology, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA.,Consortium for Fibrosis Research & Translation, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Maggie Pui Yu Lam
- Department of Medicine-Cardiology, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA.,Biochemistry & Molecular Genetics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA.,Consortium for Fibrosis Research & Translation, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| |
Collapse
|
196
|
Leblanc S, Brunet MA. Modelling of pathogen-host systems using deeper ORF annotations and transcriptomics to inform proteomics analyses. Comput Struct Biotechnol J 2020; 18:2836-2850. [PMID: 33133425 PMCID: PMC7585943 DOI: 10.1016/j.csbj.2020.10.010] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2020] [Revised: 10/07/2020] [Accepted: 10/08/2020] [Indexed: 01/08/2023] Open
Abstract
The Zika virus is a flavivirus that can cause fulminant outbreaks and lead to Guillain-Barré syndrome, microcephaly and fetal demise. Like other flaviviruses, the Zika virus is transmitted by mosquitoes and provokes neurological disorders. Despite its risk to public health, no antiviral nor vaccine are currently available. In the recent years, several studies have set to identify human host proteins interacting with Zika viral proteins to better understand its pathogenicity. Yet these studies used standard human protein sequence databases. Such databases rely on genome annotations, which enforce a minimal open reading frame (ORF) length criterion. An ever-increasing number of studies have demonstrated the shortcomings of such annotation, which overlooks thousands of functional ORFs. Here we show that the use of a customized database including currently non-annotated proteins led to the identification of 4 alternative proteins as interactors of the viral capsid and NS4A proteins. Furthermore, 12 alternative proteins were identified in the proteome profiling of Zika infected monocytes, one of which was significantly up-regulated. This study presents a computational framework for the re-analysis of proteomics datasets to better investigate the viral-host protein interplays upon infection with the Zika virus.
Collapse
Key Words
- AP-MS, affinity-purification mass spectrometry
- Alternative ORFs
- DEP, differentially expressed proteins
- FDR, false discovery rate
- FPKM, fragments per kilobase of exon model per million reads mapped
- Flavivirus
- HCIP, highly confident interacting proteins
- HCMV, human cytomegalovirus
- LFQ, label free quantification
- MS, mass spectrometry
- ORF, open reading frame
- PSM, peptide spectrum match
- Protein network
- Proteogenomics
- Proteome profiling
- ZIKV, Zika virus
- Zika
- altProt, alternative protein
- ncRNA, non-coding RNA
- sORF, small open reading frame
Collapse
Affiliation(s)
- Sebastien Leblanc
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, Sherbrooke, Québec, Canada
- PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Canada
| | - Marie A. Brunet
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, Sherbrooke, Québec, Canada
- PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Canada
| |
Collapse
|
197
|
Cesnik AJ, Miller RM, Ibrahim K, Lu L, Millikin RJ, Shortreed MR, Frey BL, Smith LM. Spritz: A Proteogenomic Database Engine. J Proteome Res 2020; 20:1826-1834. [PMID: 32967423 DOI: 10.1021/acs.jproteome.0c00407] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Proteoforms are the workhorses of the cell, and subtle differences between their amino acid sequences or post-translational modifications (PTMs) can change their biological function. To most effectively identify and quantify proteoforms in genetically diverse samples by mass spectrometry (MS), it is advantageous to search the MS data against a sample-specific protein database that is tailored to the sample being analyzed, in that it contains the correct amino acid sequences and relevant PTMs for that sample. To this end, we have developed Spritz (https://smith-chem-wisc.github.io/Spritz/), an open-source software tool for generating protein databases annotated with sequence variations and PTMs. We provide a simple graphical user interface for Windows and scripts that can be run on any operating system. Spritz automatically sets up and executes approximately 20 tools, which enable the construction of a proteogenomic database from only raw RNA sequencing data. Sequence variations that are discovered in RNA sequencing data upon comparison to the Ensembl reference genome are annotated on proteins in these databases, and PTM annotations are transferred from UniProt. Modifications can also be discovered and added to the database using bottom-up mass spectrometry data and global PTM discovery in MetaMorpheus. We demonstrate that such sample-specific databases allow the identification of variant peptides, modified variant peptides, and variant proteoforms by searching bottom-up and top-down proteomic data from the Jurkat human T lymphocyte cell line and demonstrate the identification of phosphorylated variant sites with phosphoproteomic data from the U2OS human osteosarcoma cell line.
Collapse
Affiliation(s)
- Anthony J Cesnik
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States.,Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH - Royal Institute of Technology, Stockholm 17121, Sweden.,Department of Genetics, Stanford University, Stanford, California 94305, United States.,Chan Zuckerberg Biohub, San Francisco, California 94158, United States
| | - Rachel M Miller
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Khairina Ibrahim
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Lei Lu
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Robert J Millikin
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Michael R Shortreed
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Brian L Frey
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Lloyd M Smith
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| |
Collapse
|
198
|
Applying precision medicine to unmet clinical needs in psoriatic disease. Nat Rev Rheumatol 2020; 16:609-627. [PMID: 33024296 DOI: 10.1038/s41584-020-00507-9] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/07/2020] [Indexed: 02/08/2023]
Abstract
Psoriatic disease (PsD) is a heterogeneous condition that can affect peripheral and axial joints (arthritis), entheses, skin (psoriasis) and other structures. Over the past decade, considerable advances have been made both in our understanding of the pathogenesis of PsD and in the treatment of its diverse manifestations. However, several major areas of continued unmet need in the care of patients with PsD have been identified. One of these areas is the prediction of poor outcome, notably radiographic outcome in patients with psoriatic arthritis, so that stratified medicine approaches can be taken; another is predicting response to the numerous current and emerging therapies for PsD, so that precision medicine can be applied to rapidly improve clinical outcome and reduce the risk of toxicity. In order to address these needs, novel approaches, including imaging, tissue analysis and the application of proteogenomic technologies, are proposed as methodological solutions that will assist the dissection of the critical immune-metabolic pathways in this complex disease. Learning from advances made in other inflammatory diseases, it is time to address these unmet needs in a multi-centre partnership aimed at improving short-term and long-term outcomes for patients with PsD.
Collapse
|
199
|
Schiebenhoefer H, Schallert K, Renard BY, Trappe K, Schmid E, Benndorf D, Riedel K, Muth T, Fuchs S. A complete and flexible workflow for metaproteomics data analysis based on MetaProteomeAnalyzer and Prophane. Nat Protoc 2020; 15:3212-3239. [PMID: 32859984 DOI: 10.1038/s41596-020-0368-7] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2019] [Accepted: 05/29/2020] [Indexed: 12/14/2022]
Abstract
Metaproteomics, the study of the collective protein composition of multi-organism systems, provides deep insights into the biodiversity of microbial communities and the complex functional interplay between microbes and their hosts or environment. Thus, metaproteomics has become an indispensable tool in various fields such as microbiology and related medical applications. The computational challenges in the analysis of corresponding datasets differ from those of pure-culture proteomics, e.g., due to the higher complexity of the samples and the larger reference databases demanding specific computing pipelines. Corresponding data analyses usually consist of numerous manual steps that must be closely synchronized. With MetaProteomeAnalyzer and Prophane, we have established two open-source software solutions specifically developed and optimized for metaproteomics. Among other features, peptide-spectrum matching is improved by combining different search engines and, compared to similar tools, metaproteome annotation benefits from the most comprehensive set of available databases (such as NCBI, UniProt, EggNOG, PFAM, and CAZy). The workflow described in this protocol combines both tools and leads the user through the entire data analysis process, including protein database creation, database search, protein grouping and annotation, and results visualization. To the best of our knowledge, this protocol presents the most comprehensive, detailed and flexible guide to metaproteomics data analysis to date. While beginners are provided with robust, easy-to-use, state-of-the-art data analysis in a reasonable time (a few hours, depending on, among other factors, the protein database size and the number of identified peptides and inferred proteins), advanced users benefit from the flexibility and adaptability of the workflow.
Collapse
Affiliation(s)
- Henning Schiebenhoefer
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
- Hasso Plattner Institute, Faculty for Digital Engineering, University of Potsdam, Potsdam, Germany
| | - Kay Schallert
- Bioprocess Engineering, Otto von Guericke University, Magdeburg, Germany
| | - Bernhard Y Renard
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
- Hasso Plattner Institute, Faculty for Digital Engineering, University of Potsdam, Potsdam, Germany
| | - Kathrin Trappe
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
| | - Emanuel Schmid
- ID Computational & Data Science Support, Eidgenössische Technische Hochschule, Zurich, Switzerland
| | - Dirk Benndorf
- Bioprocess Engineering, Otto von Guericke University, Magdeburg, Germany
- Bioprocess Engineering, Max Planck Institute for Dynamics of Complex Technical Systems, Magdeburg, Germany
| | - Katharina Riedel
- Center for Functional Genomics of Microbes (CFGM), Institute of Microbiology, University of Greifswald, Greifswald, Germany
| | - Thilo Muth
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
- Section S.3 eScience, Federal Institute for Materials Research and Testing (BAM), Berlin, Germany
| | - Stephan Fuchs
- Department of Infectious Diseases, Robert Koch Institute, Wernigerode, Germany.
| |
Collapse
|
200
|
Taylor EM, Byrum SD, Edmondson JL, Wardell CP, Griffin BG, Shalin SC, Gokden M, Makhoul I, Tackett AJ, Rodriguez A. Proteogenomic analysis of melanoma brain metastases from distinct anatomical sites identifies pathways of metastatic progression. Acta Neuropathol Commun 2020; 8:157. [PMID: 32891176 PMCID: PMC7487560 DOI: 10.1186/s40478-020-01029-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2020] [Accepted: 08/27/2020] [Indexed: 02/08/2023] Open
Abstract
Melanoma brain metastases (MBM) portend a grim prognosis and can occur in up to 40% of melanoma patients. Genomic characterization of brain metastases has been previously carried out to identify potential mutational drivers. However, to date a comprehensive multi-omics approach has yet to be used to analyze brain metastases. In this case report, we present an unbiased proteogenomics analyses of a patient's primary skin cancer and three brain metastases from distinct anatomic locations. We performed molecular profiling comprised of a targeted DNA panel and full transcriptome as well as proteomics using mass spectrometry. Phylogeny demonstrated that all MBMs shared a SMARCA4 mutation and deletion of 12q. Proteogenomics identified multiple pathways upregulated in the MBMs compared to the primary tumor. The protein, PIK3CG, was present in many of these pathways and had increased gene expression in metastatic melanoma tissue from the cancer genome atlas data. Proteomics demonstrated PIK3CG levels were significantly increased in all 3 MBMs and this finding was further validated by immunohistochemistry. In summary, this case report highlights the potential role of proteogenomics in identifying pathways involved in metastatic tumor progression. Furthermore, our multi-omics approach can be considered to aid in precision oncology efforts and provide avenues for therapeutic innovation.
Collapse
Affiliation(s)
- Erin M Taylor
- Department of Biochemistry and Molecular Biology, University of Arkansas for Medical Sciences, Little Rock, AR, 72205, USA
| | - Stephanie D Byrum
- Department of Biochemistry and Molecular Biology, University of Arkansas for Medical Sciences, Little Rock, AR, 72205, USA
| | - Jacob L Edmondson
- Department of Biochemistry and Molecular Biology, University of Arkansas for Medical Sciences, Little Rock, AR, 72205, USA
| | - Christopher P Wardell
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, 72205, USA
| | - Brittany G Griffin
- Department of Neurosurgery, University of Arkansas for Medical Sciences, Little Rock, AR, 72205, USA
| | - Sara C Shalin
- Department of Pathology, University of Arkansas for Medical Sciences, Little Rock, AR, 72205, USA
| | - Murat Gokden
- Department of Pathology, University of Arkansas for Medical Sciences, Little Rock, AR, 72205, USA
| | - Issam Makhoul
- Department of Medical Oncology, University of Arkansas for Medical Sciences, Little Rock, AR, 72205, USA
| | - Alan J Tackett
- Department of Biochemistry and Molecular Biology, University of Arkansas for Medical Sciences, Little Rock, AR, 72205, USA
| | - Analiz Rodriguez
- Department of Neurosurgery, University of Arkansas for Medical Sciences, Little Rock, AR, 72205, USA.
| |
Collapse
|