1
|
Gulhane P, Singh S. Unraveling the Post-Translational Modifications and therapeutical approach in NSCLC pathogenesis. Transl Oncol 2023; 33:101673. [PMID: 37062237 PMCID: PMC10133877 DOI: 10.1016/j.tranon.2023.101673] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Revised: 04/09/2023] [Accepted: 04/10/2023] [Indexed: 04/18/2023] Open
Abstract
Non-Small Cell Lung Cancer (NSCLC) is the most prevalent kind of lung cancer with around 85% of total lung cancer cases. Despite vast therapies being available, the survival rate is low (5 year survival rate is 15%) making it essential to comprehend the mechanism for NSCLC cell survival and progression. The plethora of evidences suggests that the Post Translational Modification (PTM) such as phosphorylation, methylation, acetylation, glycosylation, ubiquitination and SUMOylation are involved in various types of cancer progression and metastasis including NSCLC. Indeed, an in-depth understanding of PTM associated with NSCLC biology will provide novel therapeutic targets and insight into the current sophisticated therapeutic paradigm. Herein, we reviewed the key PTMs, epigenetic modulation, PTMs crosstalk along with proteogenomics to analyze PTMs in NSCLC and also, highlighted how epi‑miRNA, miRNA and PTM inhibitors are key modulators and serve as promising therapeutics.
Collapse
Affiliation(s)
- Pooja Gulhane
- National Centre for Cell Science, NCCS Complex, Ganeshkhind, SPPU Campus, Pune 411007, India
| | - Shailza Singh
- National Centre for Cell Science, NCCS Complex, Ganeshkhind, SPPU Campus, Pune 411007, India.
| |
Collapse
|
2
|
Harney DJ, Larance M. Annotated Protein Database Using Known Cleavage Sites for Rapid Detection of Secreted Proteins. J Proteome Res 2022; 21:965-974. [DOI: 10.1021/acs.jproteome.1c00806] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Dylan J. Harney
- Charles Perkins Centre and School of Life and Environmental Sciences, University of Sydney, 2006 Sydney, Australia
| | - Mark Larance
- Charles Perkins Centre and School of Life and Environmental Sciences, University of Sydney, 2006 Sydney, Australia
- Charles Perkins Centre and School of Medical Sciences, University of Sydney, 2006 Sydney, Australia
| |
Collapse
|
3
|
Hyung D, Baek MJ, Lee J, Cho J, Kim HS, Park C, Cho SY. Protein-gene Expression Nexus: Comprehensive characterization of human cancer cell lines with proteogenomic analysis. Comput Struct Biotechnol J 2021; 19:4759-4769. [PMID: 34504668 PMCID: PMC8405889 DOI: 10.1016/j.csbj.2021.08.022] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2021] [Revised: 08/13/2021] [Accepted: 08/14/2021] [Indexed: 12/30/2022] Open
Abstract
Researchers have gained new therapeutic insights using multi-omics platform approaches to study DNA, RNA, and proteins of comprehensively characterized human cancer cell lines. To improve our understanding of the molecular features associated with oncogenic modulation in cancer, we proposed a proteogenomic database for human cancer cell lines, called Protein-gene Expression Nexus (PEN). We have expanded the characterization of cancer cell lines to include genetic, mRNA, and protein data of 145 cancer cell lines from various public studies. PEN contains proteomic and phosphoproteomic data on 4,129,728 peptides, 13,862 proteins, 7,138 phosphorylation site-associated genomic variations, 117 studies, and 12 cancer. We analyzed functional characterizations along with the integrated datasets, such as cis/trans association for copy number alteration (CNA), single amino acid variation for coding genes, post-translation modification site variation for Single Amino Acid Variation, and novel peptide expression for noncoding regions and fusion genes. PEN provides a user-friendly interface for searching, browsing, and downloading data and also supports the visualization of genome-wide association between CNA and expression, novel peptide landscape, mRNA-protein abundance, and functional annotation. Together, this dataset and PEN data portal provide a resource to accelerate cancer research using model cancer cell lines. PEN is freely accessible at http://combio.snu.ac.kr/pen.
Collapse
Affiliation(s)
- Daejin Hyung
- National Cancer Center, 323 Ilsan-ro, Goyang-si, Gyeonggi-do 10408, Republic of Korea
| | - Min-Jeong Baek
- National Cancer Center, 323 Ilsan-ro, Goyang-si, Gyeonggi-do 10408, Republic of Korea
| | - Jongkeun Lee
- National Cancer Center, 323 Ilsan-ro, Goyang-si, Gyeonggi-do 10408, Republic of Korea
| | - Juyeon Cho
- National Cancer Center, 323 Ilsan-ro, Goyang-si, Gyeonggi-do 10408, Republic of Korea
| | - Hyoun Sook Kim
- National Cancer Center, 323 Ilsan-ro, Goyang-si, Gyeonggi-do 10408, Republic of Korea
| | - Charny Park
- National Cancer Center, 323 Ilsan-ro, Goyang-si, Gyeonggi-do 10408, Republic of Korea
| | - Soo Young Cho
- National Cancer Center, 323 Ilsan-ro, Goyang-si, Gyeonggi-do 10408, Republic of Korea.,Department of Molecular and Life Science, Hanyang University, Ansan 15588, Republic of Korea
| |
Collapse
|
4
|
Cesnik AJ, Miller RM, Ibrahim K, Lu L, Millikin RJ, Shortreed MR, Frey BL, Smith LM. Spritz: A Proteogenomic Database Engine. J Proteome Res 2020; 20:1826-1834. [PMID: 32967423 DOI: 10.1021/acs.jproteome.0c00407] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Proteoforms are the workhorses of the cell, and subtle differences between their amino acid sequences or post-translational modifications (PTMs) can change their biological function. To most effectively identify and quantify proteoforms in genetically diverse samples by mass spectrometry (MS), it is advantageous to search the MS data against a sample-specific protein database that is tailored to the sample being analyzed, in that it contains the correct amino acid sequences and relevant PTMs for that sample. To this end, we have developed Spritz (https://smith-chem-wisc.github.io/Spritz/), an open-source software tool for generating protein databases annotated with sequence variations and PTMs. We provide a simple graphical user interface for Windows and scripts that can be run on any operating system. Spritz automatically sets up and executes approximately 20 tools, which enable the construction of a proteogenomic database from only raw RNA sequencing data. Sequence variations that are discovered in RNA sequencing data upon comparison to the Ensembl reference genome are annotated on proteins in these databases, and PTM annotations are transferred from UniProt. Modifications can also be discovered and added to the database using bottom-up mass spectrometry data and global PTM discovery in MetaMorpheus. We demonstrate that such sample-specific databases allow the identification of variant peptides, modified variant peptides, and variant proteoforms by searching bottom-up and top-down proteomic data from the Jurkat human T lymphocyte cell line and demonstrate the identification of phosphorylated variant sites with phosphoproteomic data from the U2OS human osteosarcoma cell line.
Collapse
Affiliation(s)
- Anthony J Cesnik
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States.,Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH - Royal Institute of Technology, Stockholm 17121, Sweden.,Department of Genetics, Stanford University, Stanford, California 94305, United States.,Chan Zuckerberg Biohub, San Francisco, California 94158, United States
| | - Rachel M Miller
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Khairina Ibrahim
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Lei Lu
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Robert J Millikin
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Michael R Shortreed
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Brian L Frey
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Lloyd M Smith
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| |
Collapse
|
5
|
Ramesh P, Nagarajan V, Khanchandani V, Desai VK, Niranjan V. Proteomic variations of esophageal squamous cell carcinoma revealed by combining RNA-seq proteogenomics and G-PTM search strategy. Heliyon 2020; 6:e04813. [PMID: 32913912 PMCID: PMC7472856 DOI: 10.1016/j.heliyon.2020.e04813] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2020] [Revised: 07/10/2020] [Accepted: 08/25/2020] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Cancer that arises from epithelial cells of the esophagus is called esophagus squamous cell carcinoma (ESCC) and is mostly observed in developing nations. Evaluation of cancer genomes and its regulation into proteins plays a predominant role in understanding the cancer progressions. Mass-spectrometry-based proteomics is a consequential tool to estimate proteomic variation and posttranslational modifications (PTMs) from standard protein databases. Post-translational modifications play a crucial role in protein folding and PTMs can be accounted for as a biological signal to interpret the structural changes and transition order of proteins. Functional validation of cancer-related mutations can explain the effects of mutations on genes and the identification of Oncogenes and tumor suppressor genes. Therefore, we present a study on protein variations to interpret the structural changes and transition order of proteins in ESCC carcinogenesis. METHODOLOGY We are using a bottom-up proteomics approach with Galaxy-P framework and RNA sequence data analysis to generate the sample-specific databases containing details of RNA splicing and variant peptides. Once the database generated with information on variable modification, only the curated PTMs at specific positions are considered to perform spectral matching. Proteogenomics mapping was performed to identify protein variations in ESCC. RESULTS RNA-sequence proteogenomics with G-PTM (Global Post-Translational Modification) searching strategy has revealed proteomic events including several peptides that contain single amino acid variations, novel splice junction peptides and posttranslationally modified peptides. Proteogenomic mapping exhibited the splice junction peptides mapped predominantly for Malic enzyme exon type (ME-3) and MCM7 protein-coding genes that promote cancer progression, found to be exhibited in ESCC samples. Approximately 25 ± types of PTM modifications were recorded, and Protein Phosphorylation was largely noted. CONCLUSION ESCC cancer prognosis at the molecular level enables a better understanding of cancer carcinogenesis and protein modifications can be used as potential biomarkers.
Collapse
Affiliation(s)
- Pooja Ramesh
- Department of Biotechnology, RV College of Engineering, Bangalore, Karnataka, India
| | | | - Vartika Khanchandani
- Department of Biotechnology, RV College of Engineering, Bangalore, Karnataka, India
| | - Vasanth Kumar Desai
- Department of Biotechnology, RV College of Engineering, Bangalore, Karnataka, India
| | - Vidya Niranjan
- Department of Biotechnology, RV College of Engineering, Bangalore, Karnataka, India
| |
Collapse
|
6
|
Hubler SL, Kumar P, Mehta S, Easterly C, Johnson JE, Jagtap PD, Griffin TJ. Challenges in Peptide-Spectrum Matching: A Robust and Reproducible Statistical Framework for Removing Low-Accuracy, High-Scoring Hits. J Proteome Res 2019; 19:161-173. [DOI: 10.1021/acs.jproteome.9b00478] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
7
|
Myburg AA, Hussey SG, Wang JP, Street NR, Mizrachi E. Systems and Synthetic Biology of Forest Trees: A Bioengineering Paradigm for Woody Biomass Feedstocks. FRONTIERS IN PLANT SCIENCE 2019; 10:775. [PMID: 31281326 PMCID: PMC6597874 DOI: 10.3389/fpls.2019.00775] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/22/2018] [Accepted: 05/28/2019] [Indexed: 05/07/2023]
Abstract
Fast-growing forest plantations are sustainable feedstocks of plant biomass that can serve as alternatives to fossil carbon resources for materials, chemicals, and energy. Their ability to efficiently harvest light energy and carbon from the atmosphere and sequester this into metabolic precursors for lignocellulosic biopolymers and a wide range of plant specialized metabolites make them excellent biochemical production platforms and living biorefineries. Their large sizes have facilitated multi-omics analyses and systems modeling of key biological processes such as lignin biosynthesis in trees. High-throughput 'omics' approaches have also been applied in segregating tree populations where genetic variation creates abundant genetic perturbations of system components allowing construction of systems genetics models linking genes and pathways to complex trait variation. With this information in hand, it is now possible to start using synthetic biology and genome editing techniques in a bioengineering approach based on a deeper understanding and rational design of biological parts, devices, and integrated systems. However, the complexity of the biology and interacting components will require investment in big data informatics, machine learning, and intuitive visualization to fully explore multi-dimensional patterns and identify emergent properties of biological systems. Predictive systems models could be tested rapidly through high-throughput synthetic biology approaches and multigene editing. Such a bioengineering paradigm, together with accelerated genomic breeding, will be crucial for the development of a new generation of woody biorefinery crops.
Collapse
Affiliation(s)
- Alexander A. Myburg
- Department of Biochemistry, Genetics and Microbiology, Forestry and Agricultural Biotechnology Institute (FABI), University of Pretoria, Hatfield, South Africa
| | - Steven G. Hussey
- Department of Biochemistry, Genetics and Microbiology, Forestry and Agricultural Biotechnology Institute (FABI), University of Pretoria, Hatfield, South Africa
| | - Jack P. Wang
- Forest Biotechnology Group, Department of Forestry and Environmental Resources, North Carolina State University, Raleigh, NC, United States
| | - Nathaniel R. Street
- Umeå Plant Science Center, Department of Plant Physiology, Umeå University, Umeå, Sweden
| | - Eshchar Mizrachi
- Department of Biochemistry, Genetics and Microbiology, Forestry and Agricultural Biotechnology Institute (FABI), University of Pretoria, Hatfield, South Africa
| |
Collapse
|
8
|
Li S, Cha SW, Heffner K, Hizal DB, Bowen MA, Chaerkady R, Cole RN, Tejwani V, Kaushik P, Henry M, Meleady P, Sharfstein ST, Betenbaugh MJ, Bafna V, Lewis NE. Proteogenomic Annotation of Chinese Hamsters Reveals Extensive Novel Translation Events and Endogenous Retroviral Elements. J Proteome Res 2019; 18:2433-2445. [PMID: 31020842 DOI: 10.1021/acs.jproteome.8b00935] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
A high-quality genome annotation greatly facilitates successful cell line engineering. Standard draft genome annotation pipelines are based largely on de novo gene prediction, homology, and RNA-Seq data. However, draft annotations can suffer from incorrect predictions of translated sequence, inaccurate splice isoforms, and missing genes. Here, we generated a draft annotation for the newly assembled Chinese hamster genome and used RNA-Seq, proteomics, and Ribo-Seq to experimentally annotate the genome. We identified 3529 new proteins compared to the hamster RefSeq protein annotation and 2256 novel translational events (e.g., alternative splices, mutations, and novel splices). Finally, we used this pipeline to identify the source of translated retroviruses contaminating recombinant products from Chinese hamster ovary (CHO) cell lines, including 119 type-C retroviruses, thus enabling future efforts to eliminate retroviruses to reduce the costs incurred with retroviral particle clearance. In summary, the improved annotation provides a more accurate resource for CHO cell line engineering, by facilitating the interpretation of omics data, defining of cellular pathways, and engineering of complex phenotypes.
Collapse
Affiliation(s)
| | | | | | - Deniz Baycin Hizal
- Antibody Discovery and Protein Engineering , AstraZeneca , Gaithersburg , Maryland , United States
| | - Michael A Bowen
- Antibody Discovery and Protein Engineering , AstraZeneca , Gaithersburg , Maryland , United States
| | - Raghothama Chaerkady
- Antibody Discovery and Protein Engineering , AstraZeneca , Gaithersburg , Maryland , United States
| | | | - Vijay Tejwani
- Colleges of Nanoscale Science and Engineering , SUNY Polytechnic Institute , Albany , New York 12203 , United States
| | - Prashant Kaushik
- National Institute for Cellular Biotechnology , Dublin City University , Dublin 9, Ireland
| | - Michael Henry
- National Institute for Cellular Biotechnology , Dublin City University , Dublin 9, Ireland
| | - Paula Meleady
- National Institute for Cellular Biotechnology , Dublin City University , Dublin 9, Ireland
| | - Susan T Sharfstein
- Colleges of Nanoscale Science and Engineering , SUNY Polytechnic Institute , Albany , New York 12203 , United States
| | | | | | | |
Collapse
|
9
|
Wang Q, Peng WX, Wang L, Ye L. Toward multiomics-based next-generation diagnostics for precision medicine. Per Med 2019; 16:157-170. [PMID: 30816060 DOI: 10.2217/pme-2018-0085] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Our healthcare system is experiencing a paradigm shift to precision medicine, aiming at an early prediction of individual disease risks and targeted interventions. Whole-genome sequencing is currently gaining momentum, as it has the potential to capture all classes of genetic variation, thus providing a more complete picture of the individual's genetic makeup, which could be utilized in genetic testing; however, this will also lead to difficulties in interpreting the test results, necessitating careful integration of genomic data with other layers of information, both molecular multiomics measurements of epigenome, transcriptome, proteome, metabolome and even microbiome, as well as comprehensive information on diet, lifestyle and environment. Overall, the translation of patient-specific data into actionable diagnostic tools will be a challenging task, requiring expertise from multiple disciplines, secure data sharing in large reference databases and a strong computational infrastructure.
Collapse
Affiliation(s)
- Qi Wang
- Department of Emergency Medicine, Hangzhou Hospital of Traditional Chinese Medicine, Hangzhou 310007, Zhejiang Province, China
| | - Wei-Xian Peng
- Department of Emergency Medicine, Hangzhou Hospital of Traditional Chinese Medicine, Hangzhou 310007, Zhejiang Province, China
| | - Lu Wang
- Department of Emergency Medicine, Hangzhou Hospital of Traditional Chinese Medicine, Hangzhou 310007, Zhejiang Province, China
| | - Li Ye
- Department of Nursing, Tongde Hospital of Zhejiang Province, Hangzhou 310012, Zhejiang Province, China
| |
Collapse
|
10
|
González-Gomariz J, Guruceaga E, López-Sánchez M, Segura V. Proteogenomics in the context of the Human Proteome Project (HPP). Expert Rev Proteomics 2019; 16:267-275. [PMID: 30654666 DOI: 10.1080/14789450.2019.1571916] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
INTRODUCTION The technological and scientific progress performed in the Human Proteome Project (HPP) has provided to the scientific community a new set of experimental and bioinformatic methods in the challenging field of shotgun and SRM/MRM-based Proteomics. The requirements for a protein to be considered experimentally validated are now well-established, and the information about the human proteome is available in the neXtProt database, while targeted proteomic assays are stored in SRMAtlas. However, the study of the missing proteins continues being an outstanding issue. Areas covered: This review is focused on the implementation of proteogenomic methods designed to improve the detection and validation of the missing proteins. The evolution of the methodological strategies based on the combination of different omic technologies and the use of huge publicly available datasets is shown taking the Chromosome 16 Consortium as reference. Expert commentary: Proteogenomics and other strategies of data analysis implemented within the C-HPP initiative could be used as guidance to complete in a near future the catalog of the human proteins. Besides, in the next years, we will probably witness their use in the B/D-HPP initiative to go a step forward on the implications of the proteins in the human biology and disease.
Collapse
Affiliation(s)
- José González-Gomariz
- a Bioinformatics Platform, Center for Applied Medical Research , University of Navarra , Pamplona , Spain.,b IdiSNA , Navarra Institute for Health Research , Pamplona , Spain
| | - Elizabeth Guruceaga
- a Bioinformatics Platform, Center for Applied Medical Research , University of Navarra , Pamplona , Spain.,b IdiSNA , Navarra Institute for Health Research , Pamplona , Spain
| | - Macarena López-Sánchez
- a Bioinformatics Platform, Center for Applied Medical Research , University of Navarra , Pamplona , Spain
| | - Victor Segura
- a Bioinformatics Platform, Center for Applied Medical Research , University of Navarra , Pamplona , Spain.,b IdiSNA , Navarra Institute for Health Research , Pamplona , Spain
| |
Collapse
|
11
|
Low TY, Mohtar MA, Ang MY, Jamal R. Connecting Proteomics to Next‐Generation Sequencing: Proteogenomics and Its Current Applications in Biology. Proteomics 2018; 19:e1800235. [DOI: 10.1002/pmic.201800235] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2018] [Revised: 10/09/2018] [Indexed: 12/17/2022]
Affiliation(s)
- Teck Yew Low
- UKM Medical Molecular Biology Institute (UMBI)Universiti Kebangsaan Malaysia 56000 Kuala Lumpur Malaysia
| | - M. Aiman Mohtar
- UKM Medical Molecular Biology Institute (UMBI)Universiti Kebangsaan Malaysia 56000 Kuala Lumpur Malaysia
| | - Mia Yang Ang
- UKM Medical Molecular Biology Institute (UMBI)Universiti Kebangsaan Malaysia 56000 Kuala Lumpur Malaysia
| | - Rahman Jamal
- UKM Medical Molecular Biology Institute (UMBI)Universiti Kebangsaan Malaysia 56000 Kuala Lumpur Malaysia
| |
Collapse
|
12
|
Cifani P, Dhabaria A, Chen Z, Yoshimi A, Kawaler E, Abdel-Wahab O, Poirier JT, Kentsis A. ProteomeGenerator: A Framework for Comprehensive Proteomics Based on de Novo Transcriptome Assembly and High-Accuracy Peptide Mass Spectral Matching. J Proteome Res 2018; 17:3681-3692. [PMID: 30295032 DOI: 10.1021/acs.jproteome.8b00295] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Modern mass spectrometry now permits genome-scale and quantitative measurements of biological proteomes. However, analysis of specific specimens is currently hindered by the incomplete representation of biological variability of protein sequences in canonical reference proteomes and the technical demands for their construction. Here, we report ProteomeGenerator, a framework for de novo and reference-assisted proteogenomic database construction and analysis based on sample-specific transcriptome sequencing and high-accuracy mass spectrometry proteomics. This enables the assembly of proteomes encoded by actively transcribed genes, including sample-specific protein isoforms resulting from non-canonical mRNA transcription, splicing, or editing. To improve the accuracy of protein isoform identification in non-canonical proteomes, ProteomeGenerator relies on statistical target-decoy database matching calibrated using sample-specific controls. Its current implementation includes automatic integration with MaxQuant mass spectrometry proteomics algorithms. We applied this method for the proteogenomic analysis of splicing factor SRSF2 mutant leukemia cells, demonstrating high-confidence identification of non-canonical protein isoforms arising from alternative transcriptional start sites, intron retention, and cryptic exon splicing as well as improved accuracy of genome-scale proteome discovery. Additionally, we report proteogenomic performance metrics for current state-of-the-art implementations of SEQUEST HT, MaxQuant, Byonic, and PEAKS mass spectral analysis algorithms. Finally, ProteomeGenerator is implemented as a Snakemake workflow within a Singularity container for one-step installation in diverse computing environments, thereby enabling open, scalable, and facile discovery of sample-specific, non-canonical, and neomorphic biological proteomes.
Collapse
Affiliation(s)
- Paolo Cifani
- Molecular Pharmacology Program , Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center , New York City , New York 10065 , United States
| | - Avantika Dhabaria
- Molecular Pharmacology Program , Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center , New York City , New York 10065 , United States
| | - Zining Chen
- Molecular Pharmacology Program , Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center , New York City , New York 10065 , United States
| | | | | | - Omar Abdel-Wahab
- Institute for Systems Genetics and Department of Biochemistry and Molecular Pharmacology , New York University Langone Health , New York City , New York 10016 , United States
| | - John T Poirier
- Molecular Pharmacology Program , Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center , New York City , New York 10065 , United States.,Institute for Systems Genetics and Department of Biochemistry and Molecular Pharmacology , New York University Langone Health , New York City , New York 10016 , United States
| | - Alex Kentsis
- Molecular Pharmacology Program , Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center , New York City , New York 10065 , United States.,Departments of Pediatrics, Pharmacology, and Physiology & Biophysics, Weill Cornell Medical College , Cornell University , New York , New York 10065 , United States
| |
Collapse
|
13
|
Kiseleva OI, Lisitsa AV, Poverennaya EV. Proteoforms: Methods of Analysis and Clinical Prospects. Mol Biol 2018. [DOI: 10.1134/s0026893318030068] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
14
|
Lu L, Millikin RJ, Solntsev SK, Rolfs Z, Scalf M, Shortreed MR, Smith LM. Identification of MS-Cleavable and Noncleavable Chemically Cross-Linked Peptides with MetaMorpheus. J Proteome Res 2018; 17:2370-2376. [PMID: 29793340 DOI: 10.1021/acs.jproteome.8b00141] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Protein chemical cross-linking combined with mass spectrometry has become an important technique for the analysis of protein structure and protein-protein interactions. A variety of cross-linkers are well developed, but reliable, rapid, and user-friendly tools for large-scale analysis of cross-linked proteins are still in need. Here we report MetaMorpheusXL, a new search module within the MetaMorpheus software suite that identifies both MS-cleavable and noncleavable cross-linked peptides in MS data. MetaMorpheusXL identifies MS-cleavable cross-linked peptides with an ion-indexing algorithm, which enables an efficient large database search. The identification does not require the presence of signature fragment ions, an advantage compared with similar programs such as XlinkX. One complication associated with the need for signature ions from cleavable cross-linkers such as DSSO (disuccinimidyl sulfoxide) is the requirement for multiple fragmentation types and energy combinations, which is not necessary for MetaMorpheusXL. The ability to perform proteome-wide analysis is another advantage of MetaMorpheusXL compared with programs such as MeroX and DXMSMS. MetaMorpheusXL is also faster than other currently available MS-cleavable cross-link search software programs. It is imbedded in MetaMorpheus, an open-source and freely available software suite that provides a reliable, fast, user-friendly graphical user interface that is readily accessible to researchers.
Collapse
|
15
|
Lin YY, Gawronski A, Hach F, Li S, Numanagić I, Sarrafi I, Mishra S, McPherson A, Collins CC, Radovich M, Tang H, Sahinalp SC. Computational identification of micro-structural variations and their proteogenomic consequences in cancer. Bioinformatics 2018; 34:1672-1681. [PMID: 29267878 PMCID: PMC5946953 DOI: 10.1093/bioinformatics/btx807] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2017] [Revised: 11/24/2017] [Accepted: 12/15/2017] [Indexed: 12/18/2022] Open
Abstract
Motivation Rapid advancement in high throughput genome and transcriptome sequencing (HTS) and mass spectrometry (MS) technologies has enabled the acquisition of the genomic, transcriptomic and proteomic data from the same tissue sample. We introduce a computational framework, ProTIE, to integratively analyze all three types of omics data for a complete molecular profile of a tissue sample. Our framework features MiStrVar, a novel algorithmic method to identify micro structural variants (microSVs) on genomic HTS data. Coupled with deFuse, a popular gene fusion detection method we developed earlier, MiStrVar can accurately profile structurally aberrant transcripts in tumors. Given the breakpoints obtained by MiStrVar and deFuse, our framework can then identify all relevant peptides that span the breakpoint junctions and match them with unique proteomic signatures. Observing structural aberrations in all three types of omics data validates their presence in the tumor samples. Results We have applied our framework to all The Cancer Genome Atlas (TCGA) breast cancer Whole Genome Sequencing (WGS) and/or RNA-Seq datasets, spanning all four major subtypes, for which proteomics data from Clinical Proteomic Tumor Analysis Consortium (CPTAC) have been released. A recent study on this dataset focusing on SNVs has reported many that lead to novel peptides. Complementing and significantly broadening this study, we detected 244 novel peptides from 432 candidate genomic or transcriptomic sequence aberrations. Many of the fusions and microSVs we discovered have not been reported in the literature. Interestingly, the vast majority of these translated aberrations, fusions in particular, were private, demonstrating the extensive inter-genomic heterogeneity present in breast cancer. Many of these aberrations also have matching out-of-frame downstream peptides, potentially indicating novel protein sequence and structure. Availability and implementation MiStrVar is available for download at https://bitbucket.org/compbio/mistrvar, and ProTIE is available at https://bitbucket.org/compbio/protie. Contact cenksahi@indiana.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yen-Yi Lin
- School of Computing Science, Simon Fraser University, Burnaby, BC, Canada
- Vancouver Prostate Centre, Vancouver, BC, Canada
| | | | - Faraz Hach
- School of Computing Science, Simon Fraser University, Burnaby, BC, Canada
- Vancouver Prostate Centre, Vancouver, BC, Canada
- Department of Urologic Sciences, University of British Columbia, Vancouver, BC, Canada
| | - Sujun Li
- Department of Computer Science, Indiana University, Bloomington, IN, USA
| | - Ibrahim Numanagić
- School of Computing Science, Simon Fraser University, Burnaby, BC, Canada
| | - Iman Sarrafi
- School of Computing Science, Simon Fraser University, Burnaby, BC, Canada
- Vancouver Prostate Centre, Vancouver, BC, Canada
| | - Swati Mishra
- Department of Surgery, Indiana University, School of Medicine, Indianapolis, IN, USA
| | - Andrew McPherson
- School of Computing Science, Simon Fraser University, Burnaby, BC, Canada
| | - Colin C Collins
- Vancouver Prostate Centre, Vancouver, BC, Canada
- Department of Urologic Sciences, University of British Columbia, Vancouver, BC, Canada
| | - Milan Radovich
- Department of Surgery, Indiana University, School of Medicine, Indianapolis, IN, USA
| | - Haixu Tang
- Department of Computer Science, Indiana University, Bloomington, IN, USA
| | - S Cenk Sahinalp
- Vancouver Prostate Centre, Vancouver, BC, Canada
- Department of Computer Science, Indiana University, Bloomington, IN, USA
| |
Collapse
|
16
|
Solntsev SK, Shortreed MR, Frey BL, Smith LM. Enhanced Global Post-translational Modification Discovery with MetaMorpheus. J Proteome Res 2018; 17:1844-1851. [PMID: 29578715 DOI: 10.1021/acs.jproteome.7b00873] [Citation(s) in RCA: 168] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Correct identification of protein post-translational modifications (PTMs) is crucial to understanding many aspects of protein function in biological processes. G-PTM-D is a recently developed technique for global identification and localization of PTMs. Spectral file calibration prior to applying G-PTM-D, and algorithmic enhancements in the peptide database search significantly increase the accuracy, speed, and scope of PTM identification. We enhance G-PTM-D by using multinotch searches and demonstrate its effectiveness in identification of numerous types of PTMs including high-mass modifications such as glycosylations. The changes described in this work lead to a 20% increase in the number of identified modifications and an order of magnitude decrease in search time. The complete workflow is implemented in MetaMorpheus, a software tool that integrates the database search procedure, identification of coisolated peptides, spectral calibration, and the enhanced G-PTM-D workflow. Multinotch searches are also shown to be useful in contexts other than G-PTM-D by producing superior results when used instead of standard narrow-window and open database searches.
Collapse
|
17
|
Proffitt JM, Glenn J, Cesnik AJ, Jadhav A, Shortreed MR, Smith LM, Kavanagh K, Cox LA, Olivier M. Proteomics in non-human primates: utilizing RNA-Seq data to improve protein identification by mass spectrometry in vervet monkeys. BMC Genomics 2017; 18:877. [PMID: 29132314 PMCID: PMC5683380 DOI: 10.1186/s12864-017-4279-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2017] [Accepted: 11/03/2017] [Indexed: 01/05/2023] Open
Abstract
Background Shotgun proteomics utilizes a database search strategy to compare detected mass spectra to a library of theoretical spectra derived from reference genome information. As such, the robustness of proteomics results is contingent upon the completeness and accuracy of the gene annotation in the reference genome. For animal models of disease where genomic annotation is incomplete, such as non-human primates, proteogenomic methods can improve the detection of proteins by incorporating transcriptional data from RNA-Seq to improve proteomics search databases used for peptide spectral matching. Customized search databases derived from RNA-Seq data are capable of identifying unannotated genetic and splice variants while simultaneously reducing the number of comparisons to only those transcripts actively expressed in the tissue. Results We collected RNA-Seq and proteomic data from 10 vervet monkey liver samples and used the RNA-Seq data to curate sample-specific search databases which were analyzed in the program Morpheus. We compared these results against those from a search database generated from the reference vervet genome. A total of 284 previously unannotated splice junctions were predicted by the RNA-Seq data, 92 of which were confirmed by peptide spectral matches. More than half (53/92) of these unannotated splice variants had orthologs in other non-human primates, suggesting that failure to match these peptides in the reference analyses likely arose from incomplete gene model information. The sample-specific databases also identified 101 unique peptides containing single amino acid substitutions which were missed by the reference database. Because the sample-specific searches were restricted to actively expressed transcripts, the search databases were smaller, more computationally efficient, and identified more peptides at the empirically derived 1 % false discovery rate. Conclusion Proteogenomic approaches are ideally suited to facilitate the discovery and annotation of proteins in less widely studies animal models such as non-human primates. We expect that these approaches will help to improve existing genome annotations of non-human primate species such as vervet. Electronic supplementary material The online version of this article (doi: 10.1186/s12864-017-4279-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- J Michael Proffitt
- Department of Genetics, Texas Biomedical Research Institute, San Antonio, Texas, USA
| | - Jeremy Glenn
- Department of Genetics, Texas Biomedical Research Institute, San Antonio, Texas, USA
| | - Anthony J Cesnik
- Department of Chemistry, University of Wisconsin, Madison, Wisconsin, USA
| | - Avinash Jadhav
- Department of Genetics, Texas Biomedical Research Institute, San Antonio, Texas, USA.,Current address: Department of Internal Medicine, Section of Molecular Medicine, Wake Forest School of Medicine, NRC Building, G-55, Winston-Salem, North Carolina, 27157, USA
| | | | - Lloyd M Smith
- Department of Chemistry, University of Wisconsin, Madison, Wisconsin, USA.,Genome Center of Wisconsin, University of Wisconsin, Madison, Wisconsin, USA
| | - Kylie Kavanagh
- Department of Pathology and Comparative Medicine, Wake Forest School of Medicine, Winston-Salem, North Carolina, USA
| | - Laura A Cox
- Department of Genetics, Texas Biomedical Research Institute, San Antonio, Texas, USA.,Southwest National Primate Research Center, Texas Biomedical Research Institute, San Antonio, Texas, USA
| | - Michael Olivier
- Department of Genetics, Texas Biomedical Research Institute, San Antonio, Texas, USA. .,Southwest National Primate Research Center, Texas Biomedical Research Institute, San Antonio, Texas, USA. .,Current address: Department of Internal Medicine, Section of Molecular Medicine, Wake Forest School of Medicine, NRC Building, G-55, Winston-Salem, North Carolina, 27157, USA.
| |
Collapse
|
18
|
Detecting protein variants by mass spectrometry: a comprehensive study in cancer cell-lines. Genome Med 2017; 9:62. [PMID: 28716134 PMCID: PMC5514513 DOI: 10.1186/s13073-017-0454-9] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2017] [Accepted: 06/22/2017] [Indexed: 02/07/2023] Open
Abstract
Background Onco-proteogenomics aims to understand how changes in a cancer’s genome influences its proteome. One challenge in integrating these molecular data is the identification of aberrant protein products from mass-spectrometry (MS) datasets, as traditional proteomic analyses only identify proteins from a reference sequence database. Methods We established proteomic workflows to detect peptide variants within MS datasets. We used a combination of publicly available population variants (dbSNP and UniProt) and somatic variations in cancer (COSMIC) along with sample-specific genomic and transcriptomic data to examine proteome variation within and across 59 cancer cell-lines. Results We developed a set of recommendations for the detection of variants using three search algorithms, a split target-decoy approach for FDR estimation, and multiple post-search filters. We examined 7.3 million unique variant tryptic peptides not found within any reference proteome and identified 4771 mutations corresponding to somatic and germline deviations from reference proteomes in 2200 genes among the NCI60 cell-line proteomes. Conclusions We discuss in detail the technical and computational challenges in identifying variant peptides by MS and show that uncovering these variants allows the identification of druggable mutations within important cancer genes. Electronic supplementary material The online version of this article (doi:10.1186/s13073-017-0454-9) contains supplementary material, which is available to authorized users.
Collapse
|
19
|
Post-translational modifications of FDA-approved plasma biomarkers in glioblastoma samples. PLoS One 2017; 12:e0177427. [PMID: 28493947 PMCID: PMC5426747 DOI: 10.1371/journal.pone.0177427] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2016] [Accepted: 04/27/2017] [Indexed: 01/08/2023] Open
Abstract
Liquid chromatography-tandem mass spectrometry was used to analyze plasma proteins of volunteers (control) and patients with glioblastoma multiform (GBM). A database search was pre-set with a variable post-translational modification (PTM): phosphorylation, acetylation or ubiquitination. There were no significant differences between the control and the GBM groups regarding the number of protein identifications, sequence coverage or number of PTMs. However, in GBM plasma, we unambiguously observed a decreased fraction in post-translationally modified peptides identified with high quality. The disease-specific PTM patterns were extracted and mapped to the set of FDA-approved plasma protein markers. Decreases of 46% and 24% in the number of acetylated and ubiquitinated peptides, respectively, were observed in the GBM samples. Significance of capturing disease-associated patterns of protein modifications was envisaged.
Collapse
|
20
|
Willems P, Ndah E, Jonckheere V, Stael S, Sticker A, Martens L, Van Breusegem F, Gevaert K, Van Damme P. N-terminal Proteomics Assisted Profiling of the Unexplored Translation Initiation Landscape in Arabidopsis thaliana. Mol Cell Proteomics 2017; 16:1064-1080. [PMID: 28432195 PMCID: PMC5461538 DOI: 10.1074/mcp.m116.066662] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2016] [Revised: 04/11/2017] [Indexed: 01/05/2023] Open
Abstract
Proteogenomics is an emerging research field yet lacking a uniform method of analysis. Proteogenomic studies in which N-terminal proteomics and ribosome profiling are combined, suggest that a high number of protein start sites are currently missing in genome annotations. We constructed a proteogenomic pipeline specific for the analysis of N-terminal proteomics data, with the aim of discovering novel translational start sites outside annotated protein coding regions. In summary, unidentified MS/MS spectra were matched to a specific N-terminal peptide library encompassing protein N termini encoded in the Arabidopsis thaliana genome. After a stringent false discovery rate filtering, 117 protein N termini compliant with N-terminal methionine excision specificity and indicative of translation initiation were found. These include N-terminal protein extensions and translation from transposable elements and pseudogenes. Gene prediction provided supporting protein-coding models for approximately half of the protein N termini. Besides the prediction of functional domains (partially) contained within the newly predicted ORFs, further supporting evidence of translation was found in the recently released Araport11 genome re-annotation of Arabidopsis and computational translations of sequences stored in public repositories. Most interestingly, complementary evidence by ribosome profiling was found for 23 protein N termini. Finally, by analyzing protein N-terminal peptides, an in silico analysis demonstrates the applicability of our N-terminal proteogenomics strategy in revealing protein-coding potential in species with well- and poorly-annotated genomes.
Collapse
Affiliation(s)
- Patrick Willems
- From the ‡VIB/UGent Center for Plant Systems Biology, 9052 Ghent, Belgium.,§Ghent University, Department of Plant Biotechnology and Bioinformatics, 9052 Ghent.,¶VIB/UGent Center for Medical Biotechnology, 9000 Ghent, Belgium.,‖Ghent University, Department of Biochemistry, 9000 Ghent, Belgium
| | - Elvis Ndah
- ¶VIB/UGent Center for Medical Biotechnology, 9000 Ghent, Belgium.,‖Ghent University, Department of Biochemistry, 9000 Ghent, Belgium.,**Ghent University, Department of Mathematical Modeling, Statistics and Bioinformatics, 9000 Ghent, Belgium
| | - Veronique Jonckheere
- ¶VIB/UGent Center for Medical Biotechnology, 9000 Ghent, Belgium.,‖Ghent University, Department of Biochemistry, 9000 Ghent, Belgium
| | - Simon Stael
- From the ‡VIB/UGent Center for Plant Systems Biology, 9052 Ghent, Belgium.,§Ghent University, Department of Plant Biotechnology and Bioinformatics, 9052 Ghent.,¶VIB/UGent Center for Medical Biotechnology, 9000 Ghent, Belgium.,‖Ghent University, Department of Biochemistry, 9000 Ghent, Belgium
| | - Adriaan Sticker
- ¶VIB/UGent Center for Medical Biotechnology, 9000 Ghent, Belgium.,‖Ghent University, Department of Biochemistry, 9000 Ghent, Belgium.,**Ghent University, Department of Mathematical Modeling, Statistics and Bioinformatics, 9000 Ghent, Belgium
| | - Lennart Martens
- ¶VIB/UGent Center for Medical Biotechnology, 9000 Ghent, Belgium.,‖Ghent University, Department of Biochemistry, 9000 Ghent, Belgium.,**Ghent University, Department of Mathematical Modeling, Statistics and Bioinformatics, 9000 Ghent, Belgium
| | - Frank Van Breusegem
- From the ‡VIB/UGent Center for Plant Systems Biology, 9052 Ghent, Belgium.,§Ghent University, Department of Plant Biotechnology and Bioinformatics, 9052 Ghent
| | - Kris Gevaert
- ¶VIB/UGent Center for Medical Biotechnology, 9000 Ghent, Belgium.,‖Ghent University, Department of Biochemistry, 9000 Ghent, Belgium
| | - Petra Van Damme
- ¶VIB/UGent Center for Medical Biotechnology, 9000 Ghent, Belgium; .,‖Ghent University, Department of Biochemistry, 9000 Ghent, Belgium
| |
Collapse
|
21
|
Li Q, Shortreed MR, Wenger CD, Frey BL, Schaffer LV, Scalf M, Smith LM. Global Post-Translational Modification Discovery. J Proteome Res 2017; 16:1383-1390. [PMID: 28248113 PMCID: PMC5387672 DOI: 10.1021/acs.jproteome.6b00034] [Citation(s) in RCA: 63] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
A new global post-translational modification (PTM) discovery strategy, G-PTM-D, is described. A proteomics database containing UniProt-curated PTM information is supplemented with potential new modification types and sites discovered from a first-round search of mass spectrometry data with ultrawide precursor mass tolerance. A second-round search employing the supplemented database conducted with standard narrow mass tolerances yields deep coverage and a rich variety of peptide modifications with high confidence in complex unenriched samples. The G-PTM-D strategy represents a major advance to the previously reported G-PTM strategy and provides a powerful new capability to the proteomics research community.
Collapse
Affiliation(s)
- Qiyao Li
- Department of Chemistry, University of Wisconsin , 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Michael R Shortreed
- Department of Chemistry, University of Wisconsin , 1101 University Avenue, Madison, Wisconsin 53706, United States
| | | | - Brian L Frey
- Department of Chemistry, University of Wisconsin , 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Leah V Schaffer
- Department of Chemistry, University of Wisconsin , 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Mark Scalf
- Department of Chemistry, University of Wisconsin , 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Lloyd M Smith
- Department of Chemistry, University of Wisconsin , 1101 University Avenue, Madison, Wisconsin 53706, United States
| |
Collapse
|
22
|
Kumar D, Bansal G, Narang A, Basak T, Abbas T, Dash D. Integrating transcriptome and proteome profiling: Strategies and applications. Proteomics 2016; 16:2533-2544. [PMID: 27343053 DOI: 10.1002/pmic.201600140] [Citation(s) in RCA: 108] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2016] [Revised: 06/12/2016] [Accepted: 06/23/2016] [Indexed: 12/17/2022]
Abstract
Discovering the gene expression signature associated with a cellular state is one of the basic quests in majority of biological studies. For most of the clinical and cellular manifestations, these molecular differences may be exhibited across multiple layers of gene regulation like genomic variations, gene expression, protein translation and post-translational modifications. These system wide variations are dynamic in nature and their crosstalk is overwhelmingly complex, thus analyzing them separately may not be very informative. This necessitates the integrative analysis of such multiple layers of information to understand the interplay of the individual components of the biological system. Recent developments in high throughput RNA sequencing and mass spectrometric (MS) technologies to probe transcripts and proteins made these as preferred methods for understanding global gene regulation. Subsequently, improvements in "big-data" analysis techniques enable novel conclusions to be drawn from integrative transcriptomic-proteomic analysis. The unified analyses of both these data types have been rewarding for several biological objectives like improving genome annotation, predicting RNA-protein quantities, deciphering gene regulations, discovering disease markers and drug targets. There are different ways in which transcriptomics and proteomics data can be integrated; each aiming for different research objectives. Here, we review various studies, approaches and computational tools targeted for integrative analysis of these two high-throughput omics methods.
Collapse
Affiliation(s)
- Dhirendra Kumar
- G.N. Ramachandran Knowledge Center for Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, South Campus, Sukhdev Vihar, New Delhi, INDIA
| | - Gourja Bansal
- G.N. Ramachandran Knowledge Center for Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, South Campus, Sukhdev Vihar, New Delhi, INDIA
| | - Ankita Narang
- G.N. Ramachandran Knowledge Center for Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, South Campus, Sukhdev Vihar, New Delhi, INDIA
| | - Trayambak Basak
- G.N. Ramachandran Knowledge Center for Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, South Campus, Sukhdev Vihar, New Delhi, INDIA.,Academy of Scientific & Innovative Research (AcSIR), CSIR-IGIB South Campus, New Delhi, India
| | - Tahseen Abbas
- G.N. Ramachandran Knowledge Center for Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, South Campus, Sukhdev Vihar, New Delhi, INDIA.,Academy of Scientific & Innovative Research (AcSIR), CSIR-IGIB South Campus, New Delhi, India
| | - Debasis Dash
- G.N. Ramachandran Knowledge Center for Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, South Campus, Sukhdev Vihar, New Delhi, INDIA. , .,Academy of Scientific & Innovative Research (AcSIR), CSIR-IGIB South Campus, New Delhi, India. ,
| |
Collapse
|
23
|
Kohlbacher O, Vitek O, Weintraub ST. Challenges in Large-Scale Computational Mass Spectrometry and Multiomics. J Proteome Res 2016; 15:681-2. [DOI: 10.1021/acs.jproteome.6b00067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Oliver Kohlbacher
- Center for Bioinformatics, Quantitative Biology Center,
Department of Computer Science and Faculty of Medicine, University
of Tübingen and Max Planck Institute for Developmental Biology
| | - Olga Vitek
- Sy and Laurie Sternberg Interdisciplinary Associate
Professor, College of Science College of Computer and Information
Science, Northeastern University
| | - Susan T. Weintraub
- Department of Biochemistry, The University of Texas
Health Science Center at San Antonio
| |
Collapse
|