1
|
Short Linear Motifs in Colorectal Cancer Interactome and Tumorigenesis. Cells 2022; 11:cells11233739. [PMID: 36496998 PMCID: PMC9737320 DOI: 10.3390/cells11233739] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 11/16/2022] [Accepted: 11/21/2022] [Indexed: 11/25/2022] Open
Abstract
Colorectal tumorigenesis is driven by alterations in genes and proteins responsible for cancer initiation, progression, and invasion. This multistage process is based on a dense network of protein-protein interactions (PPIs) that become dysregulated as a result of changes in various cell signaling effectors. PPIs in signaling and regulatory networks are known to be mediated by short linear motifs (SLiMs), which are conserved contiguous regions of 3-10 amino acids within interacting protein domains. SLiMs are the minimum sequences required for modulating cellular PPI networks. Thus, several in silico approaches have been developed to predict and analyze SLiM-mediated PPIs. In this review, we focus on emerging evidence supporting a crucial role for SLiMs in driver pathways that are disrupted in colorectal cancer (CRC) tumorigenesis and related PPI network alterations. As a result, SLiMs, along with short peptides, are attracting the interest of researchers to devise small molecules amenable to be used as novel anti-CRC targeted therapies. Overall, the characterization of SLiMs mediating crucial PPIs in CRC may foster the development of more specific combined pharmacological approaches.
Collapse
|
2
|
Zhang F, Deng CK, Wang M, Deng B, Barber R, Huang G. Identification of novel alternative splicing biomarkers for breast cancer with LC/MS/MS and RNA-Seq. BMC Bioinformatics 2020; 21:541. [PMID: 33272210 PMCID: PMC7713335 DOI: 10.1186/s12859-020-03824-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2020] [Accepted: 10/19/2020] [Indexed: 01/12/2023] Open
Abstract
Background Alternative splicing isoforms have been reported as a new and robust class of diagnostic biomarkers. Over 95% of human genes are estimated to be alternatively spliced as a powerful means of producing functionally diverse proteins from a single gene. The emergence of next-generation sequencing technologies, especially RNA-seq, provides novel insights into large-scale detection and analysis of alternative splicing at the transcriptional level. Advances in Proteomic Technologies such as liquid chromatography coupled tandem mass spectrometry (LC–MS/MS), have shown tremendous power for the parallel characterization of large amount of proteins in biological samples. Although poor correspondence has been generally found from previous qualitative comparative analysis between proteomics and microarray data, significantly higher degrees of correlation have been observed at the level of exon. Combining protein and RNA data by searching LC–MS/MS data against a customized protein database from RNA-Seq may produce a subset of alternatively spliced protein isoform candidates that have higher confidence. Results We developed a bioinformatics workflow to discover alternative splicing biomarkers from LC–MS/MS using RNA-Seq. First, we retrieved high confident, novel alternative splicing biomarkers from the breast cancer RNA-Seq database. Then, we translated these sequences into in silico Isoform Junction Peptides, and created a customized alternative splicing database for MS searching. Lastly, we ran the Open Mass spectrometry Search Algorithm against the customized alternative splicing database with breast cancer plasma proteome. Twenty six alternative splicing biomarker peptides with one single intron event and one exon skipping event were identified. Further interpretation of biological pathways with our Integrated Pathway Analysis Database showed that these 26 peptides are associated with Cancer, Signaling, Metabolism, Regulation, Immune System and Hemostasis pathways, which are consistent with the 256 alternative splicing biomarkers from the RNA-Seq. Conclusions This paper presents a bioinformatics workflow for using RNA-seq data to discover novel alternative splicing biomarkers from the breast cancer proteome. As a complement to synthetic alternative splicing database technique for alternative splicing identification, this method combines the advantages of two platforms: mass spectrometry and next generation sequencing and can help identify potentially highly sample-specific alternative splicing isoform biomarkers at early-stage of cancer.
Collapse
Affiliation(s)
- Fan Zhang
- Vermont Biomedical Research Network and Department of Biology, University of Vermont, Burlington, VT, 05405, USA. .,Institute for Translational Research and Department of Family Medicine, University of North Texas Health Science Center, Fort Worth, TX, 76107, USA.
| | - Chris K Deng
- School of Molecular and Cellular Biology, University of Illinois at Urbana-Champaign, Champaign, IL, 61801, USA
| | - Mu Wang
- Department of Biochemistry and Molecular Biology, IU School of Medicine, Indianapolis, IN, 46202, USA.,Indiana Center for Systems Biology and Personalized Medicine, Indianapolis, IN, 46202, USA
| | - Bin Deng
- Vermont Biomedical Research Network and Department of Biology, University of Vermont, Burlington, VT, 05405, USA.,Institute for Translational Research and Department of Family Medicine, University of North Texas Health Science Center, Fort Worth, TX, 76107, USA
| | - Robert Barber
- Department of Pharmacology and Neuroscience, University of North Texas Health Science Center, Fort Worth, TX, USA
| | - Gang Huang
- Shanghai Key Laboratory for Molecular Imaging, Shanghai University of Medicine and Health Sciences, Shanghai, 201318, People's Republic of China.
| |
Collapse
|
3
|
Boonen K, Hens K, Menschaert G, Baggerman G, Valkenborg D, Ertaylan G. Beyond Genes: Re-Identifiability of Proteomic Data and Its Implications for Personalized Medicine. Genes (Basel) 2019; 10:E682. [PMID: 31492022 PMCID: PMC6770961 DOI: 10.3390/genes10090682] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Revised: 08/30/2019] [Accepted: 09/01/2019] [Indexed: 02/07/2023] Open
Abstract
The increasing availability of high throughput proteomics data provides us with opportunities as well as posing new ethical challenges regarding data privacy and re-identifiability of participants. Moreover, the fact that proteomics represents a level between the genotype and the phenotype further exacerbates the situation, introducing dilemmas related to publicly available data, anonymization, ownership of information and incidental findings. In this paper, we try to differentiate proteomics from genomics data and cover the ethical challenges related to proteomics data sharing. Finally, we give an overview of the proposed solutions and the outlook for future studies.
Collapse
Affiliation(s)
- Kurt Boonen
- VITO Health, Boeretang 200, Mol 2400, Belgium.
- Centre for Proteomics, University of Antwerpen, Antwerp 2020, Belgium.
| | - Kristien Hens
- Department of Philosophy, University of Antwerp, Antwerp 2000 & Institute of Philosophy, KU Leuven, Leuven 3000, Belgium.
| | - Gerben Menschaert
- Biobix, Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent 9000, Belgium.
| | - Geert Baggerman
- VITO Health, Boeretang 200, Mol 2400, Belgium.
- Centre for Proteomics, University of Antwerpen, Antwerp 2020, Belgium.
| | | | | |
Collapse
|
4
|
Di C, Syafrizayanti, Zhang Q, Chen Y, Wang Y, Zhang X, Liu Y, Sun C, Zhang H, Hoheisel JD. Function, clinical application, and strategies of Pre-mRNA splicing in cancer. Cell Death Differ 2018; 26:1181-1194. [PMID: 30464224 PMCID: PMC6748147 DOI: 10.1038/s41418-018-0231-3] [Citation(s) in RCA: 55] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2018] [Revised: 10/09/2018] [Accepted: 10/23/2018] [Indexed: 12/22/2022] Open
Abstract
Pre-mRNA splicing is a fundamental process that plays a considerable role in generating protein diversity. Pre-mRNA splicing is also the key to the pathology of numerous diseases, especially cancers. In this review, we discuss how aberrant splicing isoforms precisely regulate three basic functional aspects in cancer: proliferation, metastasis and apoptosis. Importantly, clinical function of aberrant splicing isoforms is also discussed, in particular concerning drug resistance and radiosensitivity. Furthermore, this review discusses emerging strategies how to modulate pathologic aberrant splicing isoforms, which are attractive, novel therapeutic agents in cancer. Last we outline current and future directions of isoforms diagnostic methodologies reported so far in cancer. Thus, it is highlighting significance of aberrant splicing isoforms as markers for cancer and as targets for cancer therapy.
Collapse
Affiliation(s)
- Cuixia Di
- Department of Radiation Medicine, Institute of Modern Physics, Chinese Academy of Sciences, 730000, Lanzhou, China.,Key Laboratory of Heavy Ion Radiation Biology and Medicine of Chinese Academy of Sciences, 730000, Lanzhou, China.,College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Syafrizayanti
- Division of Functional Genome Analysis, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 580, 69120, Heidelberg, Germany.,Department of Chemistry, Faculty of Mathematics and Natural Sciences, Andalas University, Kampus Limau Manis, Padang, Indonesia
| | - Qianjing Zhang
- Department of Radiation Medicine, Institute of Modern Physics, Chinese Academy of Sciences, 730000, Lanzhou, China.,Key Laboratory of Heavy Ion Radiation Biology and Medicine of Chinese Academy of Sciences, 730000, Lanzhou, China.,College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Yuhong Chen
- Department of Radiation Medicine, Institute of Modern Physics, Chinese Academy of Sciences, 730000, Lanzhou, China.,Key Laboratory of Heavy Ion Radiation Biology and Medicine of Chinese Academy of Sciences, 730000, Lanzhou, China.,College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Yupei Wang
- Department of Radiation Medicine, Institute of Modern Physics, Chinese Academy of Sciences, 730000, Lanzhou, China.,Key Laboratory of Heavy Ion Radiation Biology and Medicine of Chinese Academy of Sciences, 730000, Lanzhou, China.,College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Xuetian Zhang
- Department of Radiation Medicine, Institute of Modern Physics, Chinese Academy of Sciences, 730000, Lanzhou, China.,Key Laboratory of Heavy Ion Radiation Biology and Medicine of Chinese Academy of Sciences, 730000, Lanzhou, China.,College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Yang Liu
- Department of Radiation Medicine, Institute of Modern Physics, Chinese Academy of Sciences, 730000, Lanzhou, China.,Key Laboratory of Heavy Ion Radiation Biology and Medicine of Chinese Academy of Sciences, 730000, Lanzhou, China
| | - Chao Sun
- Department of Radiation Medicine, Institute of Modern Physics, Chinese Academy of Sciences, 730000, Lanzhou, China.,Key Laboratory of Heavy Ion Radiation Biology and Medicine of Chinese Academy of Sciences, 730000, Lanzhou, China
| | - Hong Zhang
- Department of Radiation Medicine, Institute of Modern Physics, Chinese Academy of Sciences, 730000, Lanzhou, China. .,Key Laboratory of Heavy Ion Radiation Biology and Medicine of Chinese Academy of Sciences, 730000, Lanzhou, China.
| | - Jörg D Hoheisel
- Division of Functional Genome Analysis, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 580, 69120, Heidelberg, Germany.
| |
Collapse
|
5
|
Event Analysis: Using Transcript Events To Improve Estimates of Abundance in RNA-seq Data. G3-GENES GENOMES GENETICS 2018; 8:2923-2940. [PMID: 30021829 PMCID: PMC6118309 DOI: 10.1534/g3.118.200373] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Alternative splicing leverages genomic content by allowing the synthesis of multiple transcripts and, by implication, protein isoforms, from a single gene. However, estimating the abundance of transcripts produced in a given tissue from short sequencing reads is difficult and can result in both the construction of transcripts that do not exist, and the failure to identify true transcripts. An alternative approach is to catalog the events that make up isoforms (splice junctions and exons). We present here the Event Analysis (EA) approach, where we project transcripts onto the genome and identify overlapping/unique regions and junctions. In addition, all possible logical junctions are assembled into a catalog. Transcripts are filtered before quantitation based on simple measures: the proportion of the events detected, and the coverage. We find that mapping to a junction catalog is more efficient at detecting novel junctions than mapping in a splice aware manner. We identify 99.8% of true transcripts while iReckon identifies 82% of the true transcripts and creates more transcripts not included in the simulation than were initially used in the simulation. Using PacBio Iso-seq data from a mouse neural progenitor cell model, EA detects 60% of the novel junctions that are combinations of existing exons while only 43% are detected by STAR. EA further detects ∼5,000 annotated junctions missed by STAR. Filtering transcripts based on the proportion of the transcript detected and the number of reads on average supporting that transcript captures 95% of the PacBio transcriptome. Filtering the reference transcriptome before quantitation, results in is a more stable estimate of isoform abundance, with improved correlation between replicates. This was particularly evident when EA is applied to an RNA-seq study of type 1 diabetes (T1D), where the coefficient of variation among subjects (n = 81) in the transcript abundance estimates was substantially reduced compared to the estimation using the full reference. EA focuses on individual transcriptional events. These events can be quantitate and analyzed directly or used to identify the probable set of expressed transcripts. Simple rules based on detected events and coverage used in filtering result in a dramatic improvement in isoform estimation without the use of ancillary data (e.g., ChIP, long reads) that may not be available for many studies.
Collapse
|
6
|
Liu Y, Gonzàlez-Porta M, Santos S, Brazma A, Marioni JC, Aebersold R, Venkitaraman AR, Wickramasinghe VO. Impact of Alternative Splicing on the Human Proteome. Cell Rep 2017; 20:1229-1241. [PMID: 28768205 PMCID: PMC5554779 DOI: 10.1016/j.celrep.2017.07.025] [Citation(s) in RCA: 113] [Impact Index Per Article: 16.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2017] [Revised: 06/02/2017] [Accepted: 07/12/2017] [Indexed: 02/02/2023] Open
Abstract
Alternative splicing is a critical determinant of genome complexity and, by implication, is assumed to engender proteomic diversity. This notion has not been experimentally tested in a targeted, quantitative manner. Here, we have developed an integrative approach to ask whether perturbations in mRNA splicing patterns alter the composition of the proteome. We integrate RNA sequencing (RNA-seq) (to comprehensively report intron retention, differential transcript usage, and gene expression) with a data-independent acquisition (DIA) method, SWATH-MS (sequential window acquisition of all theoretical spectra-mass spectrometry), to capture an unbiased, quantitative snapshot of the impact of constitutive and alternative splicing events on the proteome. Whereas intron retention is accompanied by decreased protein abundance, alterations in differential transcript usage and gene expression alter protein abundance proportionate to transcript levels. Our findings illustrate how RNA splicing links isoform expression in the human transcriptome with proteomic diversity and provides a foundation for studying perturbations associated with human diseases.
Collapse
Affiliation(s)
- Yansheng Liu
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
| | - Mar Gonzàlez-Porta
- European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Sergio Santos
- European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Alvis Brazma
- European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - John C Marioni
- European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Ruedi Aebersold
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland.
| | - Ashok R Venkitaraman
- The Medical Research Council Cancer Unit, University of Cambridge, Cambridge CB2 0XZ, UK.
| | - Vihandha O Wickramasinghe
- The Medical Research Council Cancer Unit, University of Cambridge, Cambridge CB2 0XZ, UK; RNA Biology and Cancer Laboratory, Peter MacCallum Cancer Centre, Melbourne, VIC 3000, Australia.
| |
Collapse
|
7
|
Chen JY, Pandey R, Nguyen TM. HAPPI-2: a Comprehensive and High-quality Map of Human Annotated and Predicted Protein Interactions. BMC Genomics 2017; 18:182. [PMID: 28212602 PMCID: PMC5314692 DOI: 10.1186/s12864-017-3512-1] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2015] [Accepted: 01/24/2017] [Indexed: 01/07/2023] Open
Abstract
BACKGROUND Human protein-protein interaction (PPI) data is essential to network and systems biology studies. PPI data can help biochemists hypothesize how proteins form complexes by binding to each other, how extracellular signals propagate through post-translational modification of de-activated signaling molecules, and how chemical reactions are coupled by enzymes involved in a complex biological process. Our capability to develop good public database resources for human PPI data has a direct impact on the quality of future research on genome biology and medicine. RESULTS The database of Human Annotated and Predicted Protein Interactions (HAPPI) version 2.0 is a major update to the original HAPPI 1.0 database. It contains 2,922,202 unique protein-protein interactions (PPI) linked by 23,060 human proteins, making it the most comprehensive database covering human PPI data today. These PPIs contain both physical/direct interactions and high-quality functional/indirect interactions. Compared with the HAPPI 1.0 database release, HAPPI database version 2.0 (HAPPI-2) represents a 485% of human PPI data coverage increase and a 73% protein coverage increase. The revamped HAPPI web portal provides users with a friendly search, curation, and data retrieval interface, allowing them to retrieve human PPIs and available annotation information on the interaction type, interaction quality, interacting partner drug targeting data, and disease information. The updated HAPPI-2 can be freely accessed by Academic users at http://discovery.informatics.uab.edu/HAPPI . CONCLUSIONS While the underlying data for HAPPI-2 are integrated from a diverse data sources, the new HAPPI-2 release represents a good balance between data coverage and data quality of human PPIs, making it ideally suited for network biology.
Collapse
Affiliation(s)
- Jake Y Chen
- Wenzhou Medical University First Affiliate Hospital, Wenzhou, Zhejiang Province, China. .,Medeolinx, LLC, Indianapolis, IN, 46280, USA. .,The Informatics Institute, University of Alabama at Birmingham School of Medicine, Birmingham, AL, 35294, USA. .,Indiana Center for Systems Biology and Personalized Medicine, Indiana University School of Informatics and Computing, Indianapolis, IN, 46202, USA.
| | | | - Thanh M Nguyen
- Indiana Center for Systems Biology and Personalized Medicine, Indiana University School of Informatics and Computing, Indianapolis, IN, 46202, USA
| |
Collapse
|
8
|
A method for identifying discriminative isoform-specific peptides for clinical proteomics application. BMC Genomics 2016; 17 Suppl 7:522. [PMID: 27557076 PMCID: PMC5001247 DOI: 10.1186/s12864-016-2907-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Background Clinical proteomics application aims at solving a specific clinical problem within the context of a clinical study. It has been growing rapidly in the field of biomarker discovery, especially in the area of cancer diagnostics. Until recently, protein isoform has not been viewed as a new class of early diagnostic biomarkers for clinical proteomics. A protein isoform is one of different forms of the same protein. Different forms of a protein may be produced from single-nucleotide polymorphisms (SNPs), alternative splicing, or post-translational modifications (PTMs). Previous studies have shown that protein isoforms play critical roles in tumorigenesis, disease diagnosis, and prognosis. Identifying and characterizing protein isoforms are essential to the study of molecular mechanisms and early detection of complex diseases such as breast cancer. However, there are limitations with traditional methods such as EST sequencing, Microarray profiling (exon array, Exon-exon junction array), mRNA next-generation sequencing used for protein isoform determination: 1) not in the protein level, 2) no connectivity about connection of nonadjacent exons, 3) no SNPs and PTMs, and 4) low reproducibility. Moreover, there exist the computational challenges of clinical proteomics studies: 1) low sensitivity of instruments, 2) high data noise, and 3) high variability and low repeatability, although recent advances in clinical proteomics technology, LC-MS/MS proteomics, have been used to identify candidate molecular biomarkers in diverse range of samples, including cells, tissues, serum/plasma, and other types of body fluids. Results Therefore, in the paper, we presented a peptidomics method for identifying cancer-related and isoform-specific peptide for clinical proteomics application from LC-MS/MS. First, we built a Peptidomic Database of Human Protein Isoforms, then created a peptidomics approach to perform large-scale screen of breast cancer-associated alternative splicing isoform markers in clinical proteomics, and lastly performed four kinds of validations: biological validation (explainable index), exon array, statistical validation of independent samples, and extensive pathway analysis. Conclusions Our results showed that alternative splicing isoform makers can act as independent markers of breast cancer and that the method for identifying cancer-specific protein isoform biomarkers from clinical proteomics application is an effective one for increasing the number of identified alternative splicing isoform markers in clinical proteomics. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2907-8) contains supplementary material, which is available to authorized users.
Collapse
|
9
|
Sheynkman GM, Shortreed MR, Cesnik AJ, Smith LM. Proteogenomics: Integrating Next-Generation Sequencing and Mass Spectrometry to Characterize Human Proteomic Variation. ANNUAL REVIEW OF ANALYTICAL CHEMISTRY (PALO ALTO, CALIF.) 2016; 9:521-45. [PMID: 27049631 PMCID: PMC4991544 DOI: 10.1146/annurev-anchem-071015-041722] [Citation(s) in RCA: 73] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
Mass spectrometry-based proteomics has emerged as the leading method for detection, quantification, and characterization of proteins. Nearly all proteomic workflows rely on proteomic databases to identify peptides and proteins, but these databases typically contain a generic set of proteins that lack variations unique to a given sample, precluding their detection. Fortunately, proteogenomics enables the detection of such proteomic variations and can be defined, broadly, as the use of nucleotide sequences to generate candidate protein sequences for mass spectrometry database searching. Proteogenomics is experiencing heightened significance due to two developments: (a) advances in DNA sequencing technologies that have made complete sequencing of human genomes and transcriptomes routine, and (b) the unveiling of the tremendous complexity of the human proteome as expressed at the levels of genes, cells, tissues, individuals, and populations. We review here the field of human proteogenomics, with an emphasis on its history, current implementations, the types of proteomic variations it reveals, and several important applications.
Collapse
Affiliation(s)
- Gloria M Sheynkman
- Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts 02215;
- Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115
- Department of Chemistry, University of Wisconsin, Madison, Wisconsin 53706; ,
| | - Michael R Shortreed
- Department of Chemistry, University of Wisconsin, Madison, Wisconsin 53706; ,
| | - Anthony J Cesnik
- Department of Chemistry, University of Wisconsin, Madison, Wisconsin 53706; ,
| | - Lloyd M Smith
- Department of Chemistry, University of Wisconsin, Madison, Wisconsin 53706; ,
- Genome Center of Wisconsin, University of Wisconsin, Madison, Wisconsin 53706;
| |
Collapse
|
10
|
Dieterich DC, Kreutz MR. Proteomics of the Synapse--A Quantitative Approach to Neuronal Plasticity. Mol Cell Proteomics 2016; 15:368-81. [PMID: 26307175 PMCID: PMC4739661 DOI: 10.1074/mcp.r115.051482] [Citation(s) in RCA: 51] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2015] [Revised: 07/29/2015] [Indexed: 11/06/2022] Open
Abstract
The advances in mass spectrometry based proteomics in the past 15 years have contributed to a deeper appreciation of protein networks and the composition of functional synaptic protein complexes. However, research on protein dynamics underlying core mechanisms of synaptic plasticity in brain lag far behind. In this review, we provide a synopsis on proteomic research addressing various aspects of synaptic function. We discuss the major topics in the study of protein dynamics of the chemical synapse and the limitations of current methodology. We highlight recent developments and the future importance of multidimensional proteomics and metabolic labeling. Finally, emphasis is given on the conceptual framework of modern proteomics and its current shortcomings in the quest to gain a deeper understanding of synaptic plasticity.
Collapse
Affiliation(s)
- Daniela C Dieterich
- From the ‡Institute for Pharmacology and Toxicology, Otto-von-Guericke University Magdeburg, Germany; Research Group Neuralomics, Leibniz Institute for Neurobiology Magdeburg, Germany; ¶Center for Behavioral Brain Sciences (CBBS), Magdeburg, Germany.
| | - Michael R Kreutz
- §RG Neuroplasticity, Leibniz Institute for Neurobiology, Magdeburg, Germany; ¶Center for Behavioral Brain Sciences (CBBS), Magdeburg, Germany.
| |
Collapse
|
11
|
Zhang F, Wang M, Michael T, Drabier R. Novel alternative splicing isoform biomarkers identification from high-throughput plasma proteomics profiling of breast cancer. BMC SYSTEMS BIOLOGY 2013; 7 Suppl 5:S8. [PMID: 24565027 PMCID: PMC4028860 DOI: 10.1186/1752-0509-7-s5-s8] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
BACKGROUND In the biopharmaceutical industry, biomarkers define molecular taxonomies of patients and diseases and serve as surrogate endpoints in early-phase drug trials. Molecular biomarkers can be much more sensitive than traditional lab tests. Discriminating disease biomarkers by traditional method such as DNA microarray has proved challenging. Alternative splicing isoform represents a new class of diagnostic biomarkers. Recent scientific evidence is demonstrating that the differentiation and quantification of individual alternative splicing isoforms could improve insights into disease diagnosis and management. Identifying and characterizing alternative splicing isoforms are essential to the study of molecular mechanisms and early detection of complex diseases such as breast cancer. However, there are limitations with traditional methods used for alternative splicing isoform determination such as transcriptome-level, low level of coverage and poor focus on alternative splicing. RESULTS Therefore, we presented a peptidomics approach to searching novel alternative splicing isoforms in clinical proteomics. Our results showed that the approach has significant potential in enabling discovery of new types of high-quality alternative splicing isoform biomarkers. CONCLUSIONS We developed a peptidomics approach for the proteomics community to analyze, identify, and characterize alternative splicing isoforms from MS-based proteomics experiments with more coverage and exclusive focus on alternative splicing. The approach can help generate novel hypotheses on molecular risk factors and molecular mechanisms of cancer in early stage, leading to identification of potentially highly specific alternative splicing isoform biomarkers for early detection of cancer.
Collapse
|
12
|
Zhang F, Drabier R. SASD: the Synthetic Alternative Splicing Database for identifying novel isoform from proteomics. BMC Bioinformatics 2013; 14 Suppl 14:S13. [PMID: 24267658 PMCID: PMC3850988 DOI: 10.1186/1471-2105-14-s14-s13] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Alternative splicing is an important and widespread mechanism for generating protein diversity and regulating protein expression. High-throughput identification and analysis of alternative splicing in the protein level has more advantages than in the mRNA level. The combination of alternative splicing database and tandem mass spectrometry provides a powerful technique for identification, analysis and characterization of potential novel alternative splicing protein isoforms from proteomics. RESULTS We used a three-step pipeline to create a synthetic alternative splicing database (SASD) to identify novel alternative splicing isoforms and interpret them at the context of pathway, disease, drug and organ specificity or custom gene set with maximum coverage and exclusive focus on alternative splicing. First, we extracted information on gene structures of all genes in the Ensembl Genes 71 database and incorporated the Integrated Pathway Analysis Database. Then, we compiled artificial splicing transcripts. Lastly, we translated the artificial transcripts into alternative splicing peptides. CONCLUSIONS The SASD provides the scientific community with an efficient means to identify, analyze, and characterize novel Exon Skipping and Intron Retention protein isoforms from mass spectrometry and interpret them at the context of pathway, disease, drug and organ specificity or custom gene set with maximum coverage and exclusive focus on alternative splicing.
Collapse
|
13
|
Marx H, Lemeer S, Klaeger S, Rattei T, Kuster B. MScDB: a mass spectrometry-centric protein sequence database for proteomics. J Proteome Res 2013; 12:2386-98. [PMID: 23627461 DOI: 10.1021/pr400215r] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Protein sequence databases are indispensable tools for life science research including mass spectrometry (MS)-based proteomics. In current database construction processes, sequence similarity clustering is used to reduce redundancies in the source data. Albeit powerful, it ignores the peptide-centric nature of proteomic data and the fact that MS is able to distinguish similar sequences. Therefore, we introduce an approach that structures the protein sequence space at the peptide level using theoretical and empirical information from large-scale proteomic data to generate a mass spectrometry-centric protein sequence database (MScDB). The core modules of MScDB are an in-silico proteolytic digest and a peptide-centric clustering algorithm that groups protein sequences that are indistinguishable by mass spectrometry. Analysis of various MScDB uses cases against five complex human proteomes, resulting in 69 peptide identifications not present in UniProtKB as well as 79 putative single amino acid polymorphisms. MScDB retains ~99% of the identifications in comparison to common databases despite a 3-48% increase in the theoretical peptide search space (but comparable protein sequence space). In addition, MScDB enables cross-species applications such as human/mouse graft models, and our results suggest that the uncertainty in protein assignments to one species can be smaller than 20%.
Collapse
Affiliation(s)
- Harald Marx
- Technische Universität München, Emil-Erlenmeyer-Forum 5, 85354 Freising, Germany
| | | | | | | | | |
Collapse
|
14
|
Sheynkman GM, Shortreed MR, Frey BL, Smith LM. Discovery and mass spectrometric analysis of novel splice-junction peptides using RNA-Seq. Mol Cell Proteomics 2013; 12:2341-53. [PMID: 23629695 DOI: 10.1074/mcp.o113.028142] [Citation(s) in RCA: 105] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Human proteomic databases required for MS peptide identification are frequently updated and carefully curated, yet are still incomplete because it has been challenging to acquire every protein sequence from the diverse assemblage of proteoforms expressed in every tissue and cell type. In particular, alternative splicing has been shown to be a major source of this cell-specific proteomic variation. Many new alternative splice forms have been detected at the transcript level using next generation sequencing methods, especially RNA-Seq, but it is not known how many of these transcripts are being translated. Leveraging the unprecedented capabilities of next generation sequencing methods, we collected RNA-Seq and proteomics data from the same cell population (Jurkat cells) and created a bioinformatics pipeline that builds customized databases for the discovery of novel splice-junction peptides. Eighty million paired-end Illumina reads and ∼500,000 tandem mass spectra were used to identify 12,873 transcripts (19,320 including isoforms) and 6810 proteins. We developed a bioinformatics workflow to retrieve high-confidence, novel splice junction sequences from the RNA data, translate these sequences into the analogous polypeptide sequence, and create a customized splice junction database for MS searching. Based on the RefSeq gene models, we detected 136,123 annotated and 144,818 unannotated transcript junctions. Of those, 24,834 unannotated junctions passed various quality filters (e.g. minimum read depth) and these entries were translated into 33,589 polypeptide sequences and used for database searching. We discovered 57 splice junction peptides not present in the Uniprot-Trembl proteomic database comprising an array of different splicing events, including skipped exons, alternative donors and acceptors, and noncanonical transcriptional start sites. To our knowledge this is the first example of using sample-specific RNA-Seq data to create a splice-junction database and discover new peptides resulting from alternative splicing.
Collapse
Affiliation(s)
- Gloria M Sheynkman
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Ave., Madison, Wisconsin 53706, USA
| | | | | | | |
Collapse
|
15
|
Laborde CM, Mourino-Alvarez L, Akerstrom F, Padial LR, Vivanco F, Gil-Dones F, Barderas MG. Potential blood biomarkers for stroke. Expert Rev Proteomics 2013; 9:437-49. [PMID: 22967080 DOI: 10.1586/epr.12.33] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Stroke is one of the most common causes of death worldwide and a major cause of acquired disability in adults. Despite advances in research during the last decade, prevention and treatment strategies still suffer from significant limitations, and therefore new theoretical and technical approaches are required. Technological advances in the proteomic and metabolomic areas, during recent years, have permitted a more effective search for novel biomarkers and therapeutic targets that may allow for effective risk stratification and early diagnosis with subsequent rapid treatment. This review provides a comprehensive overview of the latest candidate proteins and metabolites proposed as new potential biomarkers in stroke.
Collapse
Affiliation(s)
- Carlos M Laborde
- Laboratory of Vascular Pathophysiology, Hospital Nacional de Paraplejicos, SESCAM, Toledo, Spain
| | | | | | | | | | | | | |
Collapse
|
16
|
Zhang F, Drabier R. IPAD: the Integrated Pathway Analysis Database for Systematic Enrichment Analysis. BMC Bioinformatics 2012; 13 Suppl 15:S7. [PMID: 23046449 PMCID: PMC3439721 DOI: 10.1186/1471-2105-13-s15-s7] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Background Next-Generation Sequencing (NGS) technologies and Genome-Wide Association Studies (GWAS) generate millions of reads and hundreds of datasets, and there is an urgent need for a better way to accurately interpret and distill such large amounts of data. Extensive pathway and network analysis allow for the discovery of highly significant pathways from a set of disease vs. healthy samples in the NGS and GWAS. Knowledge of activation of these processes will lead to elucidation of the complex biological pathways affected by drug treatment, to patient stratification studies of new and existing drug treatments, and to understanding the underlying anti-cancer drug effects. There are approximately 141 biological human pathway resources as of Jan 2012 according to the Pathguide database. However, most currently available resources do not contain disease, drug or organ specificity information such as disease-pathway, drug-pathway, and organ-pathway associations. Systematically integrating pathway, disease, drug and organ specificity together becomes increasingly crucial for understanding the interrelationships between signaling, metabolic and regulatory pathway, drug action, disease susceptibility, and organ specificity from high-throughput omics data (genomics, transcriptomics, proteomics and metabolomics). Results We designed the Integrated Pathway Analysis Database for Systematic Enrichment Analysis (IPAD, http://bioinfo.hsc.unt.edu/ipad), defining inter-association between pathway, disease, drug and organ specificity, based on six criteria: 1) comprehensive pathway coverage; 2) gene/protein to pathway/disease/drug/organ association; 3) inter-association between pathway, disease, drug, and organ; 4) multiple and quantitative measurement of enrichment and inter-association; 5) assessment of enrichment and inter-association analysis with the context of the existing biological knowledge and a "gold standard" constructed from reputable and reliable sources; and 6) cross-linking of multiple available data sources. IPAD is a comprehensive database covering about 22,498 genes, 25,469 proteins, 1956 pathways, 6704 diseases, 5615 drugs, and 52 organs integrated from databases including the BioCarta, KEGG, NCI-Nature curated, Reactome, CTD, PharmGKB, DrugBank, PharmGKB, and HOMER. The database has a web-based user interface that allows users to perform enrichment analysis from genes/proteins/molecules and inter-association analysis from a pathway, disease, drug, and organ. Moreover, the quality of the database was validated with the context of the existing biological knowledge and a "gold standard" constructed from reputable and reliable sources. Two case studies were also presented to demonstrate: 1) self-validation of enrichment analysis and inter-association analysis on brain-specific markers, and 2) identification of previously undiscovered components by the enrichment analysis from a prostate cancer study. Conclusions IPAD is a new resource for analyzing, identifying, and validating pathway, disease, drug, organ specificity and their inter-associations. The statistical method we developed for enrichment and similarity measurement and the two criteria we described for setting the threshold parameters can be extended to other enrichment applications. Enriched pathways, diseases, drugs, organs and their inter-associations can be searched, displayed, and downloaded from our online user interface. The current IPAD database can help users address a wide range of biological pathway related, disease susceptibility related, drug target related and organ specificity related questions in human disease studies.
Collapse
Affiliation(s)
- Fan Zhang
- Department of Academic and Institutional Resources and Technology, University of North Texas Health Science Center, Fort Worth, USA
| | | |
Collapse
|
17
|
Veitinger M, Umlauf E, Baumgartner R, Badrnya S, Porter J, Lamont J, Gerner C, Gruber CW, Oehler R, Zellner M. A combined proteomic and genetic analysis of the highly variable platelet proteome: from plasmatic proteins and SNPs. J Proteomics 2012; 75:5848-60. [PMID: 22885077 DOI: 10.1016/j.jprot.2012.07.042] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2012] [Revised: 07/24/2012] [Accepted: 07/26/2012] [Indexed: 01/09/2023]
Abstract
High biological variation in protein expression represents a major challenge in clinical proteomics. In a study based on 2D-DIGE, we found that the standardised abundance of only a few proteins varied by more than 50%. While some of the highest variable proteins in platelets of 52 healthy elderly were of plasmatic origin, such as albumin or haptoglobin, absence of several other high-abundant plasma proteins strongly suggests that plasma-derived proteins represent an integral part of the platelet proteome. Amongst the highly variable platelet-derived proteins, two spots were both identified as GSTO1 and assigned to either the wild-type or mutant isoform of SNP A140D. Remarkably, when the spots were considered within the respective genotype groups, their CV decreased to about the median variation. Albeit 2D-DIGE allowed correct genotyping, two individuals seemed to be GSTO1*A140 deficient. Probing 2D-Western blots with novel mAb, however, detected A140 protein as additional spot at pH 8.1, caused by the SNPs E155del and E208K. In contrast to previous studies, we show that GSTO1 protein is expressed in vivo, despite the deletion E155. Our data indicate that incorporation of exogenous proteins and genetic polymorphisms of endogenous proteins represent the main source of extreme biological variation in the platelet proteome.
Collapse
Affiliation(s)
- Michael Veitinger
- Institute of Physiology, Center of Physiology and Pharmacology, Medical University of Vienna, A-1090 Vienna, Austria
| | | | | | | | | | | | | | | | | | | |
Collapse
|
18
|
Wren JD, Kupfer DM, Perkins EJ, Bridges S, Winters-Hilt S, Dozmorov MG, Braga-Neto U. Proceedings of the 2011 MidSouth Computational Biology and Bioinformatics Society (MCBIOS) Conference. BMC Bioinformatics 2011; 12 Suppl 10:S1. [PMID: 22165918 PMCID: PMC3236831 DOI: 10.1186/1471-2105-12-s10-s1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
|
19
|
Wren JD, Kupfer DM, Perkins EJ, Bridges S, Berleant D. Proceedings of the 2010 MidSouth Computational Biology and Bioinformatics Society (MCBIOS) Conference. BMC Bioinformatics 2010; 11 Suppl 6:S1. [PMID: 20946592 PMCID: PMC3026356 DOI: 10.1186/1471-2105-11-s6-s1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
|