1
|
BOLLON JORDY, SHORTREED MICHAELR, JORDAN BENT, MILLER RACHEL, JEFFERY ERIN, CAVALLI ANDREA, SMITH LLOYDM, DEWEY COLIN, SHEYNKMAN GLORIAM, TIBERI SIMONE. IsoBayes: a Bayesian approach for single-isoform proteomics inference. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.10.598223. [PMID: 38915658 PMCID: PMC11195044 DOI: 10.1101/2024.06.10.598223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/26/2024]
Abstract
Studying protein isoforms is an essential step in biomedical research; at present, the main approach for analyzing proteins is via bottom-up mass spectrometry proteomics, which return peptide identifications, that are indirectly used to infer the presence of protein isoforms. However, the detection and quantification processes are noisy; in particular, peptides may be erroneously detected, and most peptides, known as shared peptides, are associated to multiple protein isoforms. As a consequence, studying individual protein isoforms is challenging, and inferred protein results are often abstracted to the gene-level or to groups of protein isoforms. Here, we introduce IsoBayes, a novel statistical method to perform inference at the isoform level. Our method enhances the information available, by integrating mass spectrometry proteomics and transcriptomics data in a Bayesian probabilistic framework. To account for the uncertainty in the measurement process, we propose a two-layer latent variable approach: first, we sample if a peptide has been correctly detected (or, alternatively filter peptides); second, we allocate the abundance of such selected peptides across the protein(s) they are compatible with. This enables us, starting from peptide-level data, to recover protein-level data; in particular, we: i) infer the presence/absence of each protein isoform (via a posterior probability), ii) estimate its abundance (and credible interval), and iii) target isoforms where transcript and protein relative abundances significantly differ. We benchmarked our approach in simulations, and in two multi-protease real datasets: our method displays good sensitivity and specificity when detecting protein isoforms, its estimated abundances highly correlate with the ground truth, and can detect changes between protein and transcript relative abundances. IsoBayes is freely distributed as a Bioconductor R package, and is accompanied by an example usage vignette.
Collapse
Affiliation(s)
- JORDY BOLLON
- Computational and Chemical Biology, Italian Institute of Technology, CMPVdA, Aosta, Italy
- Astronomical Observatory of the Autonomous Region of the Aosta Valley (OAVdA), Nus, Italy
| | | | - BEN T JORDAN
- Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - RACHEL MILLER
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, USA
| | - ERIN JEFFERY
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA
| | - ANDREA CAVALLI
- Computational and Chemical Biology, Italian Institute of Technology, CMPVdA, Aosta, Italy
- Centre Européen de Calcul Atomique et Moléculaire, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - LLOYD M SMITH
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, USA
| | - COLIN DEWEY
- Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, WI, USA
| | - GLORIA M SHEYNKMAN
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA
| | - SIMONE TIBERI
- Department of Statistical Sciences, University of Bologna, Bologna, Italy
| |
Collapse
|
2
|
Bishop DJ, Hoffman NJ, Taylor DF, Saner NJ, Lee MJC, Hawley JA. Discordant skeletal muscle gene and protein responses to exercise. Trends Biochem Sci 2023; 48:927-936. [PMID: 37709636 DOI: 10.1016/j.tibs.2023.08.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2023] [Revised: 08/07/2023] [Accepted: 08/16/2023] [Indexed: 09/16/2023]
Abstract
The ability of skeletal muscle to adapt to repeated contractile stimuli is one of the most intriguing aspects of physiology. The molecular bases underpinning these adaptations involve increased protein activity and/or expression, mediated by an array of pre- and post-transcriptional processes, as well as translational and post-translational control. A longstanding dogma assumes a direct relationship between exercise-induced increases in mRNA levels and subsequent changes in the abundance of the proteins they encode. Drawing on the results of recent studies, we dissect and question the common assumption of a direct relationship between changes in the skeletal muscle transcriptome and proteome induced by repeated muscle contractions (e.g., exercise).
Collapse
Affiliation(s)
- David J Bishop
- Institute for Health and Sport (iHeS), Victoria University, Melbourne, Australia.
| | - Nolan J Hoffman
- Exercise and Nutrition Research Program, Mary MacKillop Institute for Health Research, Australian Catholic University, Melbourne, Australia
| | - Dale F Taylor
- Institute for Health and Sport (iHeS), Victoria University, Melbourne, Australia
| | - Nicholas J Saner
- Institute for Health and Sport (iHeS), Victoria University, Melbourne, Australia
| | - Matthew J-C Lee
- Institute for Health and Sport (iHeS), Victoria University, Melbourne, Australia
| | - John A Hawley
- Exercise and Nutrition Research Program, Mary MacKillop Institute for Health Research, Australian Catholic University, Melbourne, Australia
| |
Collapse
|
3
|
Feng S, Ji HL, Wang H, Zhang B, Sterzenbach R, Pan C, Guo X. MetaLP: An integrative linear programming method for protein inference in metaproteomics. PLoS Comput Biol 2022; 18:e1010603. [PMID: 36269761 PMCID: PMC9629623 DOI: 10.1371/journal.pcbi.1010603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2022] [Revised: 11/02/2022] [Accepted: 09/26/2022] [Indexed: 11/07/2022] Open
Abstract
Metaproteomics based on high-throughput tandem mass spectrometry (MS/MS) plays a crucial role in characterizing microbiome functions. The acquired MS/MS data is searched against a protein sequence database to identify peptides, which are then used to infer a list of proteins present in a metaproteome sample. While the problem of protein inference has been well-studied for proteomics of single organisms, it remains a major challenge for metaproteomics of complex microbial communities because of the large number of degenerate peptides shared among homologous proteins in different organisms. This challenge calls for improved discrimination of true protein identifications from false protein identifications given a set of unique and degenerate peptides identified in metaproteomics. MetaLP was developed here for protein inference in metaproteomics using an integrative linear programming method. Taxonomic abundance information extracted from metagenomics shotgun sequencing or 16s rRNA gene amplicon sequencing, was incorporated as prior information in MetaLP. Benchmarking with mock, human gut, soil, and marine microbial communities demonstrated significantly higher numbers of protein identifications by MetaLP than ProteinLP, PeptideProphet, DeepPep, PIPQ, and Sipros Ensemble. In conclusion, MetaLP could substantially improve protein inference for complex metaproteomes by incorporating taxonomic abundance information in a linear programming model.
Collapse
Affiliation(s)
- Shichao Feng
- Department of Computer Science and Engineering, University of North Texas, Denton, Texas, United States of America
| | - Hong-Long Ji
- Department of Cellular and Molecular Biology, University of Texas at Tyler, Tyler, Texas, United States of America
- Texas Lung Injury Institute, University of Texas at Tyler, Tyler, Texas, United States of America
| | - Huan Wang
- College of Informatics, Huazhong Agricultural University, Wuhan, Hubei, CHINA
| | - Bailu Zhang
- Department of Computer Science and Engineering, University of North Texas, Denton, Texas, United States of America
| | - Ryan Sterzenbach
- Department of Computer Science and Engineering, University of North Texas, Denton, Texas, United States of America
- Department of Biomedical Engineering, University of North Texas, Denton, Texas, United States of America
| | - Chongle Pan
- School of Computer Science, University of Oklahoma, Norman, Oklahoma, United States of America
| | - Xuan Guo
- Department of Computer Science and Engineering, University of North Texas, Denton, Texas, United States of America
| |
Collapse
|
4
|
Fancello L, Burger T. An analysis of proteogenomics and how and when transcriptome-informed reduction of protein databases can enhance eukaryotic proteomics. Genome Biol 2022; 23:132. [PMID: 35725496 PMCID: PMC9208142 DOI: 10.1186/s13059-022-02701-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Accepted: 06/09/2022] [Indexed: 12/03/2022] Open
Abstract
Background Proteogenomics aims to identify variant or unknown proteins in bottom-up proteomics, by searching transcriptome- or genome-derived custom protein databases. However, empirical observations reveal that these large proteogenomic databases produce lower-sensitivity peptide identifications. Various strategies have been proposed to avoid this, including the generation of reduced transcriptome-informed protein databases, which only contain proteins whose transcripts are detected in the sample-matched transcriptome. These were found to increase peptide identification sensitivity. Here, we present a detailed evaluation of this approach. Results We establish that the increased sensitivity in peptide identification is in fact a statistical artifact, directly resulting from the limited capability of target-decoy competition to accurately model incorrect target matches when using excessively small databases. As anti-conservative false discovery rates (FDRs) are likely to hamper the robustness of the resulting biological conclusions, we advocate for alternative FDR control methods that are less sensitive to database size. Nevertheless, reduced transcriptome-informed databases are useful, as they reduce the ambiguity of protein identifications, yielding fewer shared peptides. Furthermore, searching the reference database and subsequently filtering proteins whose transcripts are not expressed reduces protein identification ambiguity to a similar extent, but is more transparent and reproducible. Conclusions In summary, using transcriptome information is an interesting strategy that has not been promoted for the right reasons. While the increase in peptide identifications from searching reduced transcriptome-informed databases is an artifact caused by the use of an FDR control method unsuitable to excessively small databases, transcriptome information can reduce the ambiguity of protein identifications. Supplementary Information The online version contains supplementary material available at 10.1186/s13059-022-02701-2.
Collapse
Affiliation(s)
- Laura Fancello
- CNRS, CEA, Inserm, BioSanté U1292, Profi FR2048, Université Grenoble Alpes, Grenoble, France
| | - Thomas Burger
- CNRS, CEA, Inserm, BioSanté U1292, Profi FR2048, Université Grenoble Alpes, Grenoble, France.
| |
Collapse
|
5
|
Liu W, Liu Q, Zhang B, Lin Z, Li X, Yang X, Pu M, Zou R, He Z, Wang F, Dou K. The mRNA of TCTP functions as a sponge to maintain homeostasis of TCTP protein levels in hepatocellular carcinoma. Cell Death Dis 2020; 11:974. [PMID: 33184257 PMCID: PMC7665032 DOI: 10.1038/s41419-020-03149-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2020] [Revised: 10/10/2020] [Accepted: 10/13/2020] [Indexed: 01/01/2023]
Abstract
Translationally controlled tumor protein (TCTP) is a highly conserved protein that accumulated in the tumorigenesis of various malignancies. Despite the important role of TCTP protein in tumor progression, the precise function and underlying mechanistic regulation of TCTP mRNA in hepatocellular carcinoma (HCC) remain unclear. In this study, we found that TCTP protein was overexpressed in HCC patients but TCTP mRNA expression levels were reversed. TCTP knockout HCC cells exhibited attenuated abilities of proliferation, migration, and invasion. The knockdown of TCTP by siRNA effectively reduced TCTP mRNA levels but not protein levels in HCC cells. Moreover, although the constitutive knockdown of TCTP inhibited almost 80% of TCTP protein expression levels in tumors of wildtype transgenic mice (TCTP KD/WT), partial restoration of TCTP protein expression was observed in the tumors of heterozygous TCTP mice (TCTP KD/TCTP±). The blockage of mRNA synthesis with ActD stimulated TCTP protein expression in HCC cells. In contrast, combined treatment with ActD and CHX or MG132 treatment alone did not lead to the TCTP protein accumulation in cells. Furthermore, following the introduction of exogenous TCTP in cells and orthotopic HCC tumor models, the endogenous TCTP protein did not change with the recombinational TCTP expression and kept a rather stable level. Dual-luciferase assays revealed that the coding sequence of TCTP mRNA functions as a sponge to regulate the TCTP protein expression. Collectively, our results indicated that the TCTP mRNA and protein formed a closed regulatory circuit and works as a buffering system to keep the homeostasis of TCTP protein levels in HCC.
Collapse
Affiliation(s)
- Wei Liu
- Institute for Regenerative Medicine, Shanghai East Hospital, Tongji University, Shanghai, 200123, China.,Department of Hepatobiliary Surgery, Xijing Hospital, Air Force Medical University, Xi'an, Shaanxi Province, 710032, China.,Shanghai Institute of Stem Cell Research and Clinical Translation, Shanghai, 200120, China
| | - Qi Liu
- Department of Hepatobiliary Surgery, Xijing Hospital, Air Force Medical University, Xi'an, Shaanxi Province, 710032, China
| | - Beilei Zhang
- Department of Gynecology and Obstetrics, Tangdu Hospital, Air Force Medical University, Xi'an, Shaanxi, 710038, China
| | - Zhibin Lin
- Department of Hepatobiliary Surgery, Xijing Hospital, Air Force Medical University, Xi'an, Shaanxi Province, 710032, China
| | - Xia Li
- Institute of Biophysics, Chinese Academy of Science, Beijing, 100101, China
| | - Xisheng Yang
- Department of Hepatobiliary Surgery, Xijing Hospital, Air Force Medical University, Xi'an, Shaanxi Province, 710032, China
| | - Meng Pu
- Department of Hepatobiliary Surgery, Xijing Hospital, Air Force Medical University, Xi'an, Shaanxi Province, 710032, China
| | - Rongzhi Zou
- Department of Hepatobiliary Surgery, Xijing Hospital, Air Force Medical University, Xi'an, Shaanxi Province, 710032, China
| | - Zhiying He
- Institute for Regenerative Medicine, Shanghai East Hospital, Tongji University, Shanghai, 200123, China. .,Shanghai Institute of Stem Cell Research and Clinical Translation, Shanghai, 200120, China.
| | - Fu Wang
- Engineering Research Center of Molecular and Neuro Imaging, Ministry of Education, School of Life Science and Technology, Xidian University, Xi'an, Shaanxi, 710071, China.
| | - Kefeng Dou
- Department of Hepatobiliary Surgery, Xijing Hospital, Air Force Medical University, Xi'an, Shaanxi Province, 710032, China.
| |
Collapse
|
6
|
Lau E, Han Y, Williams DR, Thomas CT, Shrestha R, Wu JC, Lam MPY. Splice-Junction-Based Mapping of Alternative Isoforms in the Human Proteome. Cell Rep 2020; 29:3751-3765.e5. [PMID: 31825849 PMCID: PMC6961840 DOI: 10.1016/j.celrep.2019.11.026] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2019] [Revised: 09/24/2019] [Accepted: 11/06/2019] [Indexed: 12/18/2022] Open
Abstract
The protein-level translational status and function of many alternative splicing events remain poorly understood. We use an RNA sequencing (RNA-seq)-guided proteomics method to identify protein alternative splicing isoforms in the human proteome by constructing tissue-specific protein databases that prioritize transcript splice junction pairs with high translational potential. Using the custom databases to reanalyze ~80 million mass spectra in public proteomics datasets, we identify more than 1,500 noncanonical protein isoforms across 12 human tissues, including ~400 sequences undocumented on TrEMBL and RefSeq databases. We apply the method to original quantitative mass spectrometry experiments and observe widespread isoform regulation during human induced pluripotent stem cell cardiomyocyte differentiation. On a proteome scale, alternative isoform regions overlap frequently with disordered sequences and post-translational modification sites, suggesting that alternative splicing may regulate protein function through modulating intrinsically disordered regions. The described approach may help elucidate functional consequences of alternative splicing and expand the scope of proteomics investigations in various systems. The translation and function of many alternative splicing events await confirmation at the protein level. Lau et al. use an integrated proteotranscriptomics approach to identify non-canonical and undocumented isoforms from 12 organs in the human proteome. Alternative isoforms interfere with functional sequence features and are differentially regulated during iPSC cardiomyocyte differentiation.
Collapse
Affiliation(s)
- Edward Lau
- Stanford Cardiovascular Institute, Department of Medicine, Stanford University, Palo Alto, CA, USA
| | - Yu Han
- Consortium for Fibrosis Research and Translation, Anschutz Medical Campus, University of Colorado, Aurora, CO, USA; Departments of Medicine-Cardiology and Biochemistry and Molecular Genetics, Anschutz Medical Campus, University of Colorado, Aurora, CO, USA
| | - Damon R Williams
- Stanford Cardiovascular Institute, Department of Medicine, Stanford University, Palo Alto, CA, USA
| | - Cody T Thomas
- Departments of Medicine-Cardiology and Biochemistry and Molecular Genetics, Anschutz Medical Campus, University of Colorado, Aurora, CO, USA
| | - Rajani Shrestha
- Stanford Cardiovascular Institute, Department of Medicine, Stanford University, Palo Alto, CA, USA
| | - Joseph C Wu
- Stanford Cardiovascular Institute, Department of Medicine, Stanford University, Palo Alto, CA, USA; Department of Radiology, School of Medicine, Stanford University, Palo Alto, CA, USA
| | - Maggie P Y Lam
- Consortium for Fibrosis Research and Translation, Anschutz Medical Campus, University of Colorado, Aurora, CO, USA; Departments of Medicine-Cardiology and Biochemistry and Molecular Genetics, Anschutz Medical Campus, University of Colorado, Aurora, CO, USA.
| |
Collapse
|
7
|
Prieto G, Vázquez J. Protein Probability Model for High-Throughput Protein Identification by Mass Spectrometry-Based Proteomics. J Proteome Res 2020; 19:1285-1297. [PMID: 32037837 DOI: 10.1021/acs.jproteome.9b00819] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Shotgun proteomics is the method of choice for high-throughput protein identification; however, robust statistical methods are essential to automatize this task while minimizing the number of false identifications. The standard method for estimating the false discovery rate (FDR) of individual identifications and keeping it below a threshold (typically 1%) is the target-decoy approach. However, numerous works have shown that FDR at the protein level may become much larger than FDR at the peptide level. The development of an appropriate scoring model to identify proteins from their peptides using high-throughput shotgun proteomics is highly needed. In this study, we present a novel protein-level scoring algorithm that uses the scores of the identified peptides and maintains all of the properties expected for a true protein probability. We also present a refinement of the picked method to calculate FDR at the protein level. These algorithms can be used together as a robust identification workflow suitable for large-scale proteomics, and we show that the identification performance of this workflow is superior to that of other widely used methods in several samples and using different search engines. Our protein probability model offers the scientific community an algorithm that is easy to integrate into protein identification workflows for the automated analysis of shotgun proteomics data.
Collapse
Affiliation(s)
- Gorka Prieto
- Department of Communications Engineering, University of the Basque Country (UPV/EHU), 48013 Bilbao, Spain
| | - Jesús Vázquez
- Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), 28049 Madrid, Spain
| |
Collapse
|
8
|
Du Y, Clair GC, Al Alam D, Danopoulos S, Schnell D, Kitzmiller JA, Misra RS, Bhattacharya S, Warburton D, Mariani TJ, Pryhuber GS, Whitsett JA, Ansong C, Xu Y. Integration of transcriptomic and proteomic data identifies biological functions in cell populations from human infant lung. Am J Physiol Lung Cell Mol Physiol 2019; 317:L347-L360. [PMID: 31268347 DOI: 10.1152/ajplung.00475.2018] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Systems biology uses computational approaches to integrate diverse data types to understand cell and organ behavior. Data derived from complementary technologies, for example transcriptomic and proteomic analyses, are providing new insights into development and disease. We compared mRNA and protein profiles from purified endothelial, epithelial, immune, and mesenchymal cells from normal human infant lung tissue. Signatures for each cell type were identified and compared at both mRNA and protein levels. Cell-specific biological processes and pathways were predicted by analysis of concordant and discordant RNA-protein pairs. Cell clustering and gene set enrichment comparisons identified shared versus unique processes associated with transcriptomic and/or proteomic data. Clear cell-cell correlations between mRNA and protein data were obtained from each cell type. Approximately 40% of RNA-protein pairs were coherently expressed. While the correlation between RNA and their protein products was relatively low (Spearman rank coefficient rs ~0.4), cell-specific signature genes involved in functional processes characteristic of each cell type were more highly correlated with their protein products. Consistency of cell-specific RNA-protein signatures indicated an essential framework for the function of each cell type. Visualization and reutilization of the protein and RNA profiles are supported by a new web application, "LungProteomics," which is freely accessible to the public.
Collapse
Affiliation(s)
- Yina Du
- The Perinatal Institute and Section of Neonatology, Perinatal and Pulmonary Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio
| | - Geremy C Clair
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington
| | - Denise Al Alam
- Developmental Biology and Regenerative Medicine Program, Department of Pediatric Surgery, The Saban Research Institute, Children's Hospital Los Angeles, Los Angeles, California.,Keck School of Medicine, University of Southern California, Los Angeles, California
| | - Soula Danopoulos
- Developmental Biology and Regenerative Medicine Program, Department of Pediatric Surgery, The Saban Research Institute, Children's Hospital Los Angeles, Los Angeles, California.,Keck School of Medicine, University of Southern California, Los Angeles, California
| | - Daniel Schnell
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio.,Heart Institute and Center for Translational Fibrosis Research, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio
| | - Joseph A Kitzmiller
- The Perinatal Institute and Section of Neonatology, Perinatal and Pulmonary Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio
| | - Ravi S Misra
- Department of Pediatrics, University of Rochester Medical Center, Rochester, New York
| | - Soumyaroop Bhattacharya
- Department of Pediatrics, University of Rochester Medical Center, Rochester, New York.,Division of Neonatology and Program in Pediatric Molecular and Personalized Medicine, University of Rochester Medical Center, Rochester, New York
| | - David Warburton
- Developmental Biology and Regenerative Medicine Program, Department of Pediatric Surgery, The Saban Research Institute, Children's Hospital Los Angeles, Los Angeles, California.,Keck School of Medicine, University of Southern California, Los Angeles, California
| | - Thomas J Mariani
- Department of Pediatrics, University of Rochester Medical Center, Rochester, New York.,Division of Neonatology and Program in Pediatric Molecular and Personalized Medicine, University of Rochester Medical Center, Rochester, New York
| | - Gloria S Pryhuber
- Department of Pediatrics, University of Rochester Medical Center, Rochester, New York
| | - Jeffrey A Whitsett
- The Perinatal Institute and Section of Neonatology, Perinatal and Pulmonary Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio
| | - Charles Ansong
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington
| | - Yan Xu
- The Perinatal Institute and Section of Neonatology, Perinatal and Pulmonary Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio.,Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio
| |
Collapse
|
9
|
Zelic R, Fiano V, Ebot EM, Coseo Markt S, Grasso C, Trevisan M, De Marco L, Delsedime L, Zugna D, Mucci LA, Richiardi L. Single-nucleotide polymorphisms in DNMT3B gene and DNMT3B mRNA expression in association with prostate cancer mortality. Prostate Cancer Prostatic Dis 2019; 22:284-291. [PMID: 30341411 DOI: 10.1038/s41391-018-0102-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2018] [Revised: 09/04/2018] [Accepted: 09/08/2018] [Indexed: 01/02/2023]
Abstract
BACKGROUND Germline variants in DNA methyltransferase 3B (DNMT3B) may influence DNMT3B enzymatic activity, which, in turn, may affect cancer aggressiveness by altering DNA methylation. METHODS The study involves two Italian cohorts (NTAT cohort, n = 157, and 1980s biopsy cohort, n = 182) and two U.S. cohorts (Health Professionals Follow-Up Study, n = 214, and Physicians' Health Study, n = 298) of prostate cancer (PCa) patients, and a case-control study of lethal (n = 113) vs indolent (n = 290) PCa with DNMT3B mRNA expression data nested in the U.S. cohorts. We evaluated the association between: three selected DNMT3B variants and global DNA methylation using linear regression in the NTAT cohort, the three DNMT3B variants and PCa mortality using Cox proportional hazards regression in all cohorts, and DNMT3B expression and lethal PCa using logistic regression, with replication in publicly available databases (TCGA, n = 492 and MSKCC, n = 140). RESULTS The TT genotype of rs1569686 was associated with LINE-1 hypomethylation in tumor tissue (β = -2.71, 95% CI: -5.41, -0.05). There was no evidence of association between DNMT3B variants and PCa mortality. DNMT3B expression was consistently associated with lethal PCa in the two U.S. cohorts (3rd vs 1st tertile, combined cohorts: OR = 2.04, 95% CI: 1.13, 3.76); the association was replicated in TCGA and MSKCC data (3rd vs 1st tertile, TCGA: HR = 3.00, 95% CI: 1.78, 5.06; MSKCC: HR = 2.22, 95% CI: 1.01, 4.86). CONCLUSIONS Although there was no consistent evidence of an association between DNMT3B variants and PCa mortality, the TT genotype of rs1569686 was associated with LINE-1 hypomethylation in tumor tissue and DNMT3B mRNA expression was associated with an increased risk of lethal PCa.
Collapse
Affiliation(s)
- Renata Zelic
- Clinical Epidemiology Unit, Department of Medicine Solna, Karolinska Institutet, Stockholm, Sweden.
| | - Valentina Fiano
- Cancer Epidemiology Unit-CERMS, Department of Medical Sciences, University of Turin, and CPO-Piemonte, Turin, Italy
| | - Ericka M Ebot
- Department of Epidemiology, Harvard TH Chan School of Public Health, Boston, MA, 02115, USA
| | - Sarah Coseo Markt
- Department of Epidemiology, Harvard TH Chan School of Public Health, Boston, MA, 02115, USA
| | - Chiara Grasso
- Cancer Epidemiology Unit-CERMS, Department of Medical Sciences, University of Turin, and CPO-Piemonte, Turin, Italy
| | - Morena Trevisan
- Cancer Epidemiology Unit-CERMS, Department of Medical Sciences, University of Turin, and CPO-Piemonte, Turin, Italy
| | - Laura De Marco
- Cancer Epidemiology Unit-CERMS, Department of Medical Sciences, University of Turin, and CPO-Piemonte, Turin, Italy
| | - Luisa Delsedime
- Division of Pathology, A.O.U. Città della Salute e della Scienza Hospital, Turin, Italy
| | - Daniela Zugna
- Cancer Epidemiology Unit-CERMS, Department of Medical Sciences, University of Turin, and CPO-Piemonte, Turin, Italy
| | - Lorelei A Mucci
- Department of Epidemiology, Harvard TH Chan School of Public Health, Boston, MA, 02115, USA
| | - Lorenzo Richiardi
- Cancer Epidemiology Unit-CERMS, Department of Medical Sciences, University of Turin, and CPO-Piemonte, Turin, Italy
- Department of Epidemiology, Harvard TH Chan School of Public Health, Boston, MA, 02115, USA
| |
Collapse
|
10
|
Hibbert SA, Ozols M, Griffiths CEM, Watson REB, Bell M, Sherratt MJ. Defining tissue proteomes by systematic literature review. Sci Rep 2018; 8:546. [PMID: 29323144 PMCID: PMC5765030 DOI: 10.1038/s41598-017-18699-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2017] [Accepted: 12/14/2017] [Indexed: 12/24/2022] Open
Abstract
Defining protein composition is a key step in understanding the function of both healthy and diseased biological systems. There is currently little consensus between existing published proteomes in tissues such as the aorta, cartilage and organs such as skin. Lack of agreement as to both the number and identity of proteins may be due to issues in protein extraction, sensitivity/specificity of detection and the use of disparate tissue/cell sources. Here, we developed a method combining bioinformatics and systematic review to screen >32M articles from the Web of Science for evidence of proteins in healthy human skin. The resulting Manchester Proteome (www.manchesterproteome.manchester.ac.uk) collates existing evidence which characterises 2,948 skin proteins, 437 unique to our database and 2011 evidenced by both mass spectrometry and immune-based techniques. This approach circumvents the limitations of individual proteomics studies and can be applied to other species, organs, cells or disease-states. Accurate tissue proteomes will aid development of engineered constructs and offer insight into disease treatments by highlighting differences in proteomic composition.
Collapse
Affiliation(s)
- Sarah A Hibbert
- Division of Cell Matrix Biology & Regenerative Medicine, The University of Manchester, Manchester, UK.
| | - Matiss Ozols
- Division of Cell Matrix Biology & Regenerative Medicine, The University of Manchester, Manchester, UK
| | - Christopher E M Griffiths
- Centre for Dermatology Research, Faculty of Biology, Medicine and Health, The University of Manchester, Manchester, UK.,Salford Royal NHS Foundation Trust, Manchester Academic Health Science Centre, Manchester, UK.,NIHR Manchester Biomedical Research Centre, Central Manchester University Hospitals NHS Foundation Trust, Manchester Academic Health Science Centre, Manchester, UK
| | - Rachel E B Watson
- Centre for Dermatology Research, Faculty of Biology, Medicine and Health, The University of Manchester, Manchester, UK.,Salford Royal NHS Foundation Trust, Manchester Academic Health Science Centre, Manchester, UK.,NIHR Manchester Biomedical Research Centre, Central Manchester University Hospitals NHS Foundation Trust, Manchester Academic Health Science Centre, Manchester, UK
| | - Mike Bell
- Walgreens Boots Alliance, Thane Road, Nottingham, UK
| | - Michael J Sherratt
- Division of Cell Matrix Biology & Regenerative Medicine, The University of Manchester, Manchester, UK.
| |
Collapse
|
11
|
Zhong J, Wang J, Ding X, Zhang Z, Li M, Wu FX, Pan Y. Protein Inference from the Integration of Tandem MS Data and Interactome Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:1399-1409. [PMID: 28113634 DOI: 10.1109/tcbb.2016.2601618] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Since proteins are digested into a mixture of peptides in the preprocessing step of tandem mass spectrometry (MS), it is difficult to determine which specific protein a shared peptide belongs to. In recent studies, besides tandem MS data and peptide identification information, some other information is exploited to infer proteins. Different from the methods which first use only tandem MS data to infer proteins and then use network information to refine them, this study proposes a protein inference method named TMSIN, which uses interactome networks directly. As two interacting proteins should co-exist, it is reasonable to assume that if one of the interacting proteins is confidently inferred in a sample, its interacting partners should have a high probability in the same sample, too. Therefore, we can use the neighborhood information of a protein in an interactome network to adjust the probability that the shared peptide belongs to the protein. In TMSIN, a multi-weighted graph is constructed by incorporating the bipartite graph with interactome network information, where the bipartite graph is built with the peptide identification information. Based on multi-weighted graphs, TMSIN adopts an iterative workflow to infer proteins. At each iterative step, the probability that a shared peptide belongs to a specific protein is calculated by using the Bayes' law based on the neighbor protein support scores of each protein which are mapped by the shared peptides. We carried out experiments on yeast data and human data to evaluate the performance of TMSIN in terms of ROC, q-value, and accuracy. The experimental results show that AUC scores yielded by TMSIN are 0.742 and 0.874 in yeast dataset and human dataset, respectively, and TMSIN yields the maximum number of true positives when q-value less than or equal to 0.05. The overlap analysis shows that TMSIN is an effective complementary approach for protein inference.
Collapse
|
12
|
Tsai MA, Chen IH, Wang JH, Chou SJ, Li TH, Leu MY, Ho HK, Yang WC. A probe-based qRT-PCR method to profile immunological gene expression in blood of captive beluga whales ( Delphinapterus leucas). PeerJ 2017; 5:e3840. [PMID: 28970970 PMCID: PMC5622604 DOI: 10.7717/peerj.3840] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2017] [Accepted: 09/01/2017] [Indexed: 12/20/2022] Open
Abstract
Cytokines are fundamental for a functioning immune system, and thus potentially serve as important indicators of animal health. Quantitation of mRNA using quantitative reverse transcription polymerase chain reaction (qRT-PCR) is an established immunological technique. It is particularly suitable for detecting the expression of proteins against which monoclonal antibodies are not available. In this study, we developed a probe-based quantitative gene expression assay for immunological assessment of captive beluga whales (Delphinapterus leucas) that is one of the most common cetacean species on display in aquariums worldwide. Six immunologically relevant genes (IL-2Rα, -4, -10, -12, TNFα, and IFNγ) were selected for analysis, and two validated housekeeping genes (PGK1 and RPL4) with stable expression were used as reference genes. Sixteen blood samples were obtained from four animals with different health conditions and stored in RNAlater™ solution. These samples were used for RNA extraction followed by qRT-PCR analysis. Analysis of gene transcripts was performed by relative quantitation using the comparative Cq method with the integration of amplification efficiency and two reference genes. The expression levels of each gene in the samples from clinically healthy animals were normally distributed. Transcript outliers for IL-2Rα, IL-4, IL-12, TNFα, and IFNγ were noticed in four samples collected from two clinically unhealthy animals. This assay has the potential to identify immune system deviation from normal state, which is caused by health problems. Furthermore, knowing the immune status of captive cetaceans could help both trainers and veterinarians in implementing preventive approaches prior to disease onset.
Collapse
Affiliation(s)
- Ming-An Tsai
- Department of Biology, National Museum of Marine Biology and Aquarium, Pingtung, Taiwan
| | - I-Hua Chen
- College of Veterinary Medicine, National Chiayi University, Chiayi, Taiwan
| | - Jiann-Hsiung Wang
- College of Veterinary Medicine, National Chiayi University, Chiayi, Taiwan
| | - Shih-Jen Chou
- College of Veterinary Medicine, National Chiayi University, Chiayi, Taiwan
| | - Tsung-Hsien Li
- Department of Biology, National Museum of Marine Biology and Aquarium, Pingtung, Taiwan
| | - Ming-Yih Leu
- Department of Biology, National Museum of Marine Biology and Aquarium, Pingtung, Taiwan
| | - Hsiao-Kuan Ho
- Department of Biology, Hi-Scene World Enterprise Co. Ltd., Pingtung, Taiwan
| | - Wei Cheng Yang
- College of Veterinary Medicine, National Chiayi University, Chiayi, Taiwan
| |
Collapse
|
13
|
Langella O, Valot B, Balliau T, Blein-Nicolas M, Bonhomme L, Zivy M. X!TandemPipeline: A Tool to Manage Sequence Redundancy for Protein Inference and Phosphosite Identification. J Proteome Res 2016; 16:494-503. [DOI: 10.1021/acs.jproteome.6b00632] [Citation(s) in RCA: 126] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Affiliation(s)
- Olivier Langella
- PAPPSO,
GQE - Le Moulon, INRA, Univ. Paris-Sud, CNRS, AgroParisTech, Université Paris-Saclay, 91190 Gif-sur-Yvette, France
| | - Benoît Valot
- UMR
6249 Chrono-Environnement, CNRS, Université de Bourgogne Franche-Comté, 25030 Besançon, France
| | - Thierry Balliau
- PAPPSO,
GQE - Le Moulon, INRA, Univ. Paris-Sud, CNRS, AgroParisTech, Université Paris-Saclay, 91190 Gif-sur-Yvette, France
| | - Mélisande Blein-Nicolas
- PAPPSO,
GQE - Le Moulon, INRA, Univ. Paris-Sud, CNRS, AgroParisTech, Université Paris-Saclay, 91190 Gif-sur-Yvette, France
| | - Ludovic Bonhomme
- INRA/UBP, UMR 1095, Genetics, Diversity
and Ecophysiology of Cereals, F63100 Clermont-Ferrand, France
| | - Michel Zivy
- PAPPSO,
GQE - Le Moulon, INRA, Univ. Paris-Sud, CNRS, AgroParisTech, Université Paris-Saclay, 91190 Gif-sur-Yvette, France
| |
Collapse
|
14
|
Protein inference: A protein quantification perspective. Comput Biol Chem 2016; 63:21-29. [DOI: 10.1016/j.compbiolchem.2016.02.006] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2016] [Accepted: 02/01/2016] [Indexed: 01/04/2023]
|
15
|
Zhao L, Chen Y, Bajaj AO, Eblimit A, Xu M, Soens ZT, Wang F, Ge Z, Jung SY, He F, Li Y, Wensel TG, Qin J, Chen R. Integrative subcellular proteomic analysis allows accurate prediction of human disease-causing genes. Genome Res 2016; 26:660-9. [PMID: 26912414 PMCID: PMC4864458 DOI: 10.1101/gr.198911.115] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2015] [Accepted: 02/19/2016] [Indexed: 12/04/2022]
Abstract
Proteomic profiling on subcellular fractions provides invaluable information regarding both protein abundance and subcellular localization. When integrated with other data sets, it can greatly enhance our ability to predict gene function genome-wide. In this study, we performed a comprehensive proteomic analysis on the light-sensing compartment of photoreceptors called the outer segment (OS). By comparing with the protein profile obtained from the retina tissue depleted of OS, an enrichment score for each protein is calculated to quantify protein subcellular localization, and 84% accuracy is achieved compared with experimental data. By integrating the protein OS enrichment score, the protein abundance, and the retina transcriptome, the probability of a gene playing an essential function in photoreceptor cells is derived with high specificity and sensitivity. As a result, a list of genes that will likely result in human retinal disease when mutated was identified and validated by previous literature and/or animal model studies. Therefore, this new methodology demonstrates the synergy of combining subcellular fractionation proteomics with other omics data sets and is generally applicable to other tissues and diseases.
Collapse
Affiliation(s)
- Li Zhao
- Structural and Computational Biology and Molecular Biophysics Graduate Program, Baylor College of Medicine, Houston, Texas 77030, USA; Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Yiyun Chen
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Amol Onkar Bajaj
- Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Aiden Eblimit
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Mingchu Xu
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Zachry T Soens
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Feng Wang
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Zhongqi Ge
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Sung Yun Jung
- Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Feng He
- Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Yumei Li
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Theodore G Wensel
- Structural and Computational Biology and Molecular Biophysics Graduate Program, Baylor College of Medicine, Houston, Texas 77030, USA; Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Jun Qin
- Structural and Computational Biology and Molecular Biophysics Graduate Program, Baylor College of Medicine, Houston, Texas 77030, USA; Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Rui Chen
- Structural and Computational Biology and Molecular Biophysics Graduate Program, Baylor College of Medicine, Houston, Texas 77030, USA; Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
| |
Collapse
|
16
|
Moscovitz JE, Nahar MS, Shalat SL, Slitt AL, Dolinoy DC, Aleksunes LM. Correlation between Conjugated Bisphenol A Concentrations and Efflux Transporter Expression in Human Fetal Livers. ACTA ACUST UNITED AC 2016; 44:1061-5. [PMID: 26851240 DOI: 10.1124/dmd.115.068668] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2015] [Accepted: 02/04/2016] [Indexed: 12/14/2022]
Abstract
Because of its widespread use in the manufacturing of consumer products over several decades, human exposure to bisphenol A (BPA) has been pervasive. Fetuses are particularly sensitive to BPA exposure, with a number of negative developmental and reproductive outcomes observed in rodent perinatal models. Xenobiotic transporters are one mechanism to extrude conjugated and unconjugated BPA from the liver. In this study, the mRNA expression of xenobiotic transporters and relationships with total, conjugated, and free BPA levels were explored utilizing human fetal liver samples. The mRNA expression of breast cancer resistance protein (BCRP) and multidrug resistance-associated transporter (MRP)4, as well as BCRP and multidrug resistance transporter 1 exhibited the highest degree of correlation, with r(2) values of 0.941 and 0.816 (P < 0.001 for both), respectively. Increasing concentrations of conjugated BPA significantly correlated with high expression of MRP1 (P < 0.001), MRP2 (P < 0.05), and MRP3 (P < 0.05) transporters, in addition to the NF-E2-related factor 2 transcription factor (P < 0.001) and its prototypical target gene, NAD(P)H quinone oxidoreductase 1 (P < 0.001). These data demonstrate that xenobiotic transporters may be coordinately expressed in the human fetal liver. This is also the first report of a relationship between environmentally relevant fetal BPA levels and differences in the expression of transporters that can excrete the parent compound and its metabolites.
Collapse
Affiliation(s)
- Jamie E Moscovitz
- Department of Pharmacology and Toxicology, Ernest Mario School of Pharmacy, Rutgers University, Piscataway, New Jersey (J.E.M., L.M.A.); Department of Environmental Health Sciences, University of Michigan, Ann Arbor, Michigan (M.S.N., D.C.D.); Division of Environmental Health, School of Public Health, Georgia State University, Atlanta, Georgia (S.L.S.); Robert Wood Johnson Medical School, Rutgers University, Piscataway, New Jersey (S.L.S.); Environmental and Occupational Health Sciences Institute, Piscataway, New Jersey (S.L.S., L.M.A.); Department of Biomedical and Pharmaceutical Sciences, University of Rhode Island, Kingston, Rhode Island (A.L.S.); and Department of Nutritional Sciences, University of Michigan, Ann Arbor, Michigan (D.C.D.)
| | - Muna S Nahar
- Department of Pharmacology and Toxicology, Ernest Mario School of Pharmacy, Rutgers University, Piscataway, New Jersey (J.E.M., L.M.A.); Department of Environmental Health Sciences, University of Michigan, Ann Arbor, Michigan (M.S.N., D.C.D.); Division of Environmental Health, School of Public Health, Georgia State University, Atlanta, Georgia (S.L.S.); Robert Wood Johnson Medical School, Rutgers University, Piscataway, New Jersey (S.L.S.); Environmental and Occupational Health Sciences Institute, Piscataway, New Jersey (S.L.S., L.M.A.); Department of Biomedical and Pharmaceutical Sciences, University of Rhode Island, Kingston, Rhode Island (A.L.S.); and Department of Nutritional Sciences, University of Michigan, Ann Arbor, Michigan (D.C.D.)
| | - Stuart L Shalat
- Department of Pharmacology and Toxicology, Ernest Mario School of Pharmacy, Rutgers University, Piscataway, New Jersey (J.E.M., L.M.A.); Department of Environmental Health Sciences, University of Michigan, Ann Arbor, Michigan (M.S.N., D.C.D.); Division of Environmental Health, School of Public Health, Georgia State University, Atlanta, Georgia (S.L.S.); Robert Wood Johnson Medical School, Rutgers University, Piscataway, New Jersey (S.L.S.); Environmental and Occupational Health Sciences Institute, Piscataway, New Jersey (S.L.S., L.M.A.); Department of Biomedical and Pharmaceutical Sciences, University of Rhode Island, Kingston, Rhode Island (A.L.S.); and Department of Nutritional Sciences, University of Michigan, Ann Arbor, Michigan (D.C.D.)
| | - Angela L Slitt
- Department of Pharmacology and Toxicology, Ernest Mario School of Pharmacy, Rutgers University, Piscataway, New Jersey (J.E.M., L.M.A.); Department of Environmental Health Sciences, University of Michigan, Ann Arbor, Michigan (M.S.N., D.C.D.); Division of Environmental Health, School of Public Health, Georgia State University, Atlanta, Georgia (S.L.S.); Robert Wood Johnson Medical School, Rutgers University, Piscataway, New Jersey (S.L.S.); Environmental and Occupational Health Sciences Institute, Piscataway, New Jersey (S.L.S., L.M.A.); Department of Biomedical and Pharmaceutical Sciences, University of Rhode Island, Kingston, Rhode Island (A.L.S.); and Department of Nutritional Sciences, University of Michigan, Ann Arbor, Michigan (D.C.D.)
| | - Dana C Dolinoy
- Department of Pharmacology and Toxicology, Ernest Mario School of Pharmacy, Rutgers University, Piscataway, New Jersey (J.E.M., L.M.A.); Department of Environmental Health Sciences, University of Michigan, Ann Arbor, Michigan (M.S.N., D.C.D.); Division of Environmental Health, School of Public Health, Georgia State University, Atlanta, Georgia (S.L.S.); Robert Wood Johnson Medical School, Rutgers University, Piscataway, New Jersey (S.L.S.); Environmental and Occupational Health Sciences Institute, Piscataway, New Jersey (S.L.S., L.M.A.); Department of Biomedical and Pharmaceutical Sciences, University of Rhode Island, Kingston, Rhode Island (A.L.S.); and Department of Nutritional Sciences, University of Michigan, Ann Arbor, Michigan (D.C.D.)
| | - Lauren M Aleksunes
- Department of Pharmacology and Toxicology, Ernest Mario School of Pharmacy, Rutgers University, Piscataway, New Jersey (J.E.M., L.M.A.); Department of Environmental Health Sciences, University of Michigan, Ann Arbor, Michigan (M.S.N., D.C.D.); Division of Environmental Health, School of Public Health, Georgia State University, Atlanta, Georgia (S.L.S.); Robert Wood Johnson Medical School, Rutgers University, Piscataway, New Jersey (S.L.S.); Environmental and Occupational Health Sciences Institute, Piscataway, New Jersey (S.L.S., L.M.A.); Department of Biomedical and Pharmaceutical Sciences, University of Rhode Island, Kingston, Rhode Island (A.L.S.); and Department of Nutritional Sciences, University of Michigan, Ann Arbor, Michigan (D.C.D.)
| |
Collapse
|
17
|
He Z, Huang T, Zhao C, Teng B. Protein Inference. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2016; 919:237-242. [PMID: 27975221 DOI: 10.1007/978-3-319-41448-5_12] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Protein inference is one of the most important steps in protein identification, which transforms peptides identified from tandem mass spectra into a list of proteins. In this chapter, we provide a brief introduction on this problem and present a short summary on the existing protein inference methods in the literature.
Collapse
Affiliation(s)
- Zengyou He
- School of Software, Dalian University of Technology, Dalian, China.
| | - Ting Huang
- College of Computer and Information Science, Northeastern University, Boston, MA, USA
| | - Can Zhao
- School of Software, Dalian University of Technology, Dalian, China
| | - Ben Teng
- School of Software, Dalian University of Technology, Dalian, China
| |
Collapse
|
18
|
Zhao C, Liu D, Teng B, He Z. BagReg: Protein inference through machine learning. Comput Biol Chem 2015; 57:12-20. [DOI: 10.1016/j.compbiolchem.2015.02.009] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2014] [Accepted: 02/03/2015] [Indexed: 10/24/2022]
|
19
|
Bauernfeind AL, Reyzer ML, Caprioli RM, Ely JJ, Babbitt CC, Wray GA, Hof PR, Sherwood CC. High spatial resolution proteomic comparison of the brain in humans and chimpanzees. J Comp Neurol 2015; 523:2043-61. [PMID: 25779868 DOI: 10.1002/cne.23777] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2014] [Revised: 02/03/2015] [Accepted: 03/11/2015] [Indexed: 12/30/2022]
Abstract
We performed high-throughput mass spectrometry at high spatial resolution from individual regions (anterior cingulate and primary motor, somatosensory, and visual cortices) and layers of the neocortex (layers III, IV, and V) and cerebellum (granule cell layer), as well as the caudate nucleus in humans and chimpanzees. A total of 39 mass spectrometry peaks were matched with probable protein identifications in both species, allowing for comparison in expression. We explored how the pattern of protein expression varies across regions and cortical layers to provide insights into the differences in molecular phenotype of these neural structures between species. The expression of proteins differed principally in a region- and layer-specific pattern, with more subtle differences between species. Specifically, human and chimpanzee brains were similar in their distribution of proteins related to the regulation of transcription and enzyme activity but differed in their expression of proteins supporting aerobic metabolism. Whereas most work assessing molecular expression differences in the brains of primates has been performed on gene transcripts, this dataset extends current understanding of the differential molecular expression that may underlie human cognitive specializations.
Collapse
Affiliation(s)
- Amy L Bauernfeind
- Department of Anatomy and Neurobiology, Washington University School of Medicine, St. Louis, Missouri, 63110.,Department of Anthropology, Washington University in St. Louis, St. Louis, Missouri, 63130.,Department of Anthropology, The George Washington University, Washington, DC, 20052
| | - Michelle L Reyzer
- Mass Spectrometry Research Center, Vanderbilt University Medical Center, Nashville, Tennessee, 37232.,Department of Biochemistry, Vanderbilt University Medical Center, Nashville, Tennessee, 37232
| | - Richard M Caprioli
- Mass Spectrometry Research Center, Vanderbilt University Medical Center, Nashville, Tennessee, 37232.,Department of Biochemistry, Vanderbilt University Medical Center, Nashville, Tennessee, 37232
| | - John J Ely
- MAEBIOS-TM, Alamogordo, New Mexico, 88310
| | - Courtney C Babbitt
- Department of Biology, University of Massachusetts Amherst, Amherst, Massachusetts 01003
| | - Gregory A Wray
- Institute for Genome Sciences & Policy, Duke University, Durham, North Carolina, 27708.,Department of Biology, Duke University, Durham, North Carolina, 27708.,Department of Evolutionary Anthropology, Duke University, Durham, North Carolina, 27708
| | - Patrick R Hof
- Fishberg Department of Neuroscience and Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, New York, 10029.,New York Consortium in Evolutionary Primatology, New York, New York
| | - Chet C Sherwood
- Department of Anthropology, The George Washington University, Washington, DC, 20052
| |
Collapse
|
20
|
Bauernfeind AL, Soderblom EJ, Turner ME, Moseley MA, Ely JJ, Hof PR, Sherwood CC, Wray GA, Babbitt CC. Evolutionary Divergence of Gene and Protein Expression in the Brains of Humans and Chimpanzees. Genome Biol Evol 2015; 7:2276-88. [PMID: 26163674 PMCID: PMC4558850 DOI: 10.1093/gbe/evv132] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Although transcriptomic profiling has become the standard approach for exploring molecular differences in the primate brain, very little is known about how the expression levels of gene transcripts relate to downstream protein abundance. Moreover, it is unknown whether the relationship changes depending on the brain region or species under investigation. We performed high-throughput transcriptomic (RNA-Seq) and proteomic (liquid chromatography coupled with tandem mass spectrometry) analyses on two regions of the human and chimpanzee brain: The anterior cingulate cortex and caudate nucleus. In both brain regions, we found a lower correlation between mRNA and protein expression levels in humans and chimpanzees than has been reported for other tissues and cell types, suggesting that the brain may engage extensive tissue-specific regulation affecting protein abundance. In both species, only a few categories of biological function exhibited strong correlations between mRNA and protein expression levels. These categories included oxidative metabolism and protein synthesis and modification, indicating that the expression levels of mRNA transcripts supporting these biological functions are more predictive of protein expression compared with other functional categories. More generally, however, the two measures of molecular expression provided strikingly divergent perspectives into differential expression between human and chimpanzee brains: mRNA comparisons revealed significant differences in neuronal communication, ion transport, and regulatory processes, whereas protein comparisons indicated differences in perception and cognition, metabolic processes, and organization of the cytoskeleton. Our results highlight the importance of examining protein expression in evolutionary analyses and call for a more thorough understanding of tissue-specific protein expression levels.
Collapse
Affiliation(s)
- Amy L Bauernfeind
- Department of Anatomy and Neurobiology, Washington University Medical School Department of Anthropology, Washington University in St. Louis Department of Anthropology and Center for the Advanced Study of Human Paleobiology, The George Washington University
| | - Erik J Soderblom
- Proteomics and Metabolomics Shared Resource, Duke University School of Medicine Center for Genomic and Computational Biology, Duke University
| | - Meredith E Turner
- Proteomics and Metabolomics Shared Resource, Duke University School of Medicine Center for Genomic and Computational Biology, Duke University
| | - M Arthur Moseley
- Proteomics and Metabolomics Shared Resource, Duke University School of Medicine Center for Genomic and Computational Biology, Duke University
| | | | - Patrick R Hof
- Fishberg Department of Neuroscience and Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, New York New York Consortium in Evolutionary Primatology, New York, New York
| | - Chet C Sherwood
- Department of Anthropology and Center for the Advanced Study of Human Paleobiology, The George Washington University
| | - Gregory A Wray
- Center for Genomic and Computational Biology, Duke University Department of Biology, Duke University Department of Evolutionary Anthropology, Duke University
| | | |
Collapse
|
21
|
Sikdar S, Gill R, Datta S. Improving protein identification from tandem mass spectrometry data by one-step methods and integrating data from other platforms. Brief Bioinform 2015; 17:262-9. [PMID: 26141827 DOI: 10.1093/bib/bbv043] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2015] [Indexed: 01/28/2023] Open
Abstract
MOTIVATION Many approaches have been proposed for the protein identification problem based on tandem mass spectrometry (MS/MS) data. In these experiments, proteins are digested into peptides and the resulting peptide mixture is subjected to mass spectrometry. Some interesting putative peptide features (peaks) are selected from the mass spectra. Following that, the precursor ions undergo fragmentation and are analyzed by MS/MS. The process of identification of peptides from the mass spectra and the constituent proteins in the sample is called protein identification from MS/MS data. There are many two-step protein identification procedures, reviewed in the literature, which first attempt to identify the peptides in a separate process and then use these results to infer the proteins. However, in recent years, there have been attempts to provide a one-step solution to protein identification, which simultaneously identifies the proteins and the peptides in the sample. RESULTS In this review, we briefly introduce the most popular two-step protein identification procedure, PeptideProphet coupled with ProteinProphet. Following that, we describe the difficulties with two-step procedures and review some recently introduced one-step protein/peptide identification procedures that do not suffer from these issues. The focus of this review is on one-step procedures that are based on statistical likelihood-based models, but some discussion of other one-step procedures is also included. We report comparative performances of one-step and two-step methods, which support the overall superiorities of one-step procedures. We also cover some recent efforts to improve protein identification by incorporating other molecular data along with MS/MS data.
Collapse
|
22
|
Uszkoreit J, Maerkens A, Perez-Riverol Y, Meyer HE, Marcus K, Stephan C, Kohlbacher O, Eisenacher M. PIA: An Intuitive Protein Inference Engine with a Web-Based User Interface. J Proteome Res 2015; 14:2988-97. [DOI: 10.1021/acs.jproteome.5b00121] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Affiliation(s)
- Julian Uszkoreit
- Medizinisches Proteom-Center, Ruhr-Universität Bochum, 44801 Bochum, Germany
| | - Alexandra Maerkens
- Medizinisches Proteom-Center, Ruhr-Universität Bochum, 44801 Bochum, Germany
| | | | - Helmut E. Meyer
- Medizinisches Proteom-Center, Ruhr-Universität Bochum, 44801 Bochum, Germany
| | - Katrin Marcus
- Medizinisches Proteom-Center, Ruhr-Universität Bochum, 44801 Bochum, Germany
| | - Christian Stephan
- Medizinisches Proteom-Center, Ruhr-Universität Bochum, 44801 Bochum, Germany
| | - Oliver Kohlbacher
- Medizinisches Proteom-Center, Ruhr-Universität Bochum, 44801 Bochum, Germany
| | - Martin Eisenacher
- Medizinisches Proteom-Center, Ruhr-Universität Bochum, 44801 Bochum, Germany
| |
Collapse
|
23
|
Väremo L, Scheele C, Broholm C, Mardinoglu A, Kampf C, Asplund A, Nookaew I, Uhlén M, Pedersen BK, Nielsen J. Proteome- and transcriptome-driven reconstruction of the human myocyte metabolic network and its use for identification of markers for diabetes. Cell Rep 2015; 11:921-933. [PMID: 25937284 DOI: 10.1016/j.celrep.2015.04.010] [Citation(s) in RCA: 87] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2014] [Revised: 02/06/2015] [Accepted: 04/03/2015] [Indexed: 11/16/2022] Open
Abstract
Skeletal myocytes are metabolically active and susceptible to insulin resistance and are thus implicated in type 2 diabetes (T2D). This complex disease involves systemic metabolic changes, and their elucidation at the systems level requires genome-wide data and biological networks. Genome-scale metabolic models (GEMs) provide a network context for the integration of high-throughput data. We generated myocyte-specific RNA-sequencing data and investigated their correlation with proteome data. These data were then used to reconstruct a comprehensive myocyte GEM. Next, we performed a meta-analysis of six studies comparing muscle transcription in T2D versus healthy subjects. Transcriptional changes were mapped on the myocyte GEM, revealing extensive transcriptional regulation in T2D, particularly around pyruvate oxidation, branched-chain amino acid catabolism, and tetrahydrofolate metabolism, connected through the downregulated dihydrolipoamide dehydrogenase. Strikingly, the gene signature underlying this metabolic regulation successfully classifies the disease state of individual samples, suggesting that regulation of these pathways is a ubiquitous feature of myocytes in response to T2D.
Collapse
Affiliation(s)
- Leif Väremo
- Department of Biology and Biological Engineering, Chalmers University of Technology, 41296 Gothenburg, Sweden
| | - Camilla Scheele
- Centre of Inflammation and Metabolism and Centre for Physical Activity Research, Department of Infectious Diseases, Rigshospitalet, University of Copenhagen, 2100 Copenhagen Ø, Denmark; Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Christa Broholm
- Centre of Inflammation and Metabolism and Centre for Physical Activity Research, Department of Infectious Diseases, Rigshospitalet, University of Copenhagen, 2100 Copenhagen Ø, Denmark
| | - Adil Mardinoglu
- Department of Biology and Biological Engineering, Chalmers University of Technology, 41296 Gothenburg, Sweden
| | - Caroline Kampf
- Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, 75185 Uppsala, Sweden
| | - Anna Asplund
- Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, 75185 Uppsala, Sweden
| | - Intawat Nookaew
- Department of Biology and Biological Engineering, Chalmers University of Technology, 41296 Gothenburg, Sweden
| | - Mathias Uhlén
- Department of Proteomics, School of Biotechnology, AlbaNova University Center, Royal Institute of Technology (KTH), 10691 Stockholm, Sweden; Science for Life Laboratory, Royal Institute of Technology (KTH), 17121 Stockholm, Sweden
| | - Bente Klarlund Pedersen
- Centre of Inflammation and Metabolism and Centre for Physical Activity Research, Department of Infectious Diseases, Rigshospitalet, University of Copenhagen, 2100 Copenhagen Ø, Denmark
| | - Jens Nielsen
- Department of Biology and Biological Engineering, Chalmers University of Technology, 41296 Gothenburg, Sweden; Science for Life Laboratory, Royal Institute of Technology (KTH), 17121 Stockholm, Sweden.
| |
Collapse
|
24
|
Shanmugam AK, Yocum AK, Nesvizhskii AI. Utility of RNA-seq and GPMDB protein observation frequency for improving the sensitivity of protein identification by tandem MS. J Proteome Res 2014; 13:4113-9. [PMID: 25026199 PMCID: PMC4156250 DOI: 10.1021/pr500496p] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
![]()
Tandem mass spectrometry (MS/MS)
followed by database search is
the method of choice for protein identification in proteomic studies.
Database searching methods employ spectral matching algorithms and
statistical models to identify and quantify proteins in a sample.
In general, these methods do not utilize any information other than
spectral data for protein identification. However, considering the
wealth of external data available for many biological systems, analysis
methods can incorporate such information to improve the sensitivity
of protein identification. In this study, we present a method to utilize
Global Proteome Machine Database identification frequencies and RNA-seq
transcript abundances to adjust the confidence scores of protein identifications.
The method described is particularly useful for samples with low-to-moderate
proteome coverage (i.e., <2000–3000 proteins), where we
observe up to an 8% improvement in the number of proteins identified
at a 1% false discovery rate.
Collapse
Affiliation(s)
- Avinash K Shanmugam
- Department of Computational Medicine and Bioinformatics and ‡Department of Pathology, University of Michigan , Ann Arbor, Michigan 48109, United States
| | | | | |
Collapse
|
25
|
Recent updates on drug abuse analyzed by neuroproteomics studies: Cocaine, Methamphetamine and MDMA. TRANSLATIONAL PROTEOMICS 2014. [DOI: 10.1016/j.trprot.2014.04.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
|
26
|
Parts L, Liu YC, Tekkedil MM, Steinmetz LM, Caudy AA, Fraser AG, Boone C, Andrews BJ, Rosebrock AP. Heritability and genetic basis of protein level variation in an outbred population. Genome Res 2014; 24:1363-70. [PMID: 24823668 PMCID: PMC4120089 DOI: 10.1101/gr.170506.113] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
The genetic basis of heritable traits has been studied for decades. Although recent mapping efforts have elucidated genetic determinants of transcript levels, mapping of protein abundance has lagged. Here, we analyze levels of 4084 GFP-tagged yeast proteins in the progeny of a cross between a laboratory and a wild strain using flow cytometry and high-content microscopy. The genotype of trans variants contributed little to protein level variation between individual cells but explained >50% of the variance in the population’s average protein abundance for half of the GFP fusions tested. To map trans-acting factors responsible, we performed flow sorting and bulk segregant analysis of 25 proteins, finding a median of five protein quantitative trait loci (pQTLs) per GFP fusion. Further, we find that cis-acting variants predominate; the genotype of a gene and its surrounding region had a large effect on protein level six times more frequently than the rest of the genome combined. We present evidence for both shared and independent genetic control of transcript and protein abundance: More than half of the expression QTLs (eQTLs) contribute to changes in protein levels of regulated genes, but several pQTLs do not affect their cognate transcript levels. Allele replacements of genes known to underlie trans eQTL hotspots confirmed the correlation of effects on mRNA and protein levels. This study represents the first genome-scale measurement of genetic contribution to protein levels in single cells and populations, identifies more than a hundred trans pQTLs, and validates the propagation of effects associated with transcript variation to protein abundance.
Collapse
Affiliation(s)
- Leopold Parts
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, M5S3E1, Canada; Department of Molecular Genetics, University of Toronto, Toronto, M5S3E1, Canada
| | - Yi-Chun Liu
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, M5S3E1, Canada
| | - Manu M Tekkedil
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, 69117 Heidelberg, Germany
| | - Lars M Steinmetz
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, 69117 Heidelberg, Germany; Department of Genetics, Stanford University School of Medicine, Stanford, California 94305, USA
| | - Amy A Caudy
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, M5S3E1, Canada; Department of Molecular Genetics, University of Toronto, Toronto, M5S3E1, Canada
| | - Andrew G Fraser
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, M5S3E1, Canada; Department of Molecular Genetics, University of Toronto, Toronto, M5S3E1, Canada
| | - Charles Boone
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, M5S3E1, Canada; Department of Molecular Genetics, University of Toronto, Toronto, M5S3E1, Canada
| | - Brenda J Andrews
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, M5S3E1, Canada; Department of Molecular Genetics, University of Toronto, Toronto, M5S3E1, Canada
| | - Adam P Rosebrock
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, M5S3E1, Canada;
| |
Collapse
|
27
|
Wang X, Zhang B. Integrating genomic, transcriptomic, and interactome data to improve Peptide and protein identification in shotgun proteomics. J Proteome Res 2014; 13:2715-23. [PMID: 24792918 PMCID: PMC4059263 DOI: 10.1021/pr500194t] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
![]()
Mass spectrometry (MS)-based shotgun
proteomics is an effective
technology for global proteome profiling. The ultimate goal is to
assign tandem MS spectra to peptides and subsequently infer proteins
and their abundance. In addition to database searching and protein
assembly algorithms, computational approaches have been developed
to integrate genomic, transcriptomic, and interactome information
to improve peptide and protein identification. Earlier efforts focus
primarily on making databases more comprehensive using publicly available
genomic and transcriptomic data. More recently, with the increasing
affordability of the Next Generation Sequencing (NGS) technologies,
personalized protein databases derived from sample-specific genomic
and transcriptomic data have emerged as an attractive strategy. In
addition, incorporating interactome data not only improves protein
identification but also puts identified proteins into their functional
context and thus facilitates data interpretation. In this paper, we
survey the major integrative bioinformatics approaches that have been
developed during the past decade and discuss their merits and demerits.
Collapse
Affiliation(s)
- Xiaojing Wang
- Department of Biomedical Informatics, ‡Vanderbilt-Ingram Cancer Center, and §Department of Cancer Biology, Vanderbilt University School of Medicine , Nashville, Tennessee 37232, United States
| | | |
Collapse
|
28
|
Goh WWB, Wong L. Computational proteomics: designing a comprehensive analytical strategy. Drug Discov Today 2014; 19:266-74. [DOI: 10.1016/j.drudis.2013.07.008] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2013] [Revised: 06/28/2013] [Accepted: 07/11/2013] [Indexed: 02/02/2023]
|
29
|
Abstract
MOTIVATION Statistical validation of protein identifications is an important issue in shotgun proteomics. The false discovery rate (FDR) is a powerful statistical tool for evaluating the protein identification result. Several research efforts have been made for FDR estimation at the protein level. However, there are still certain drawbacks in the existing FDR estimation methods based on the target-decoy strategy. RESULTS In this article, we propose a decoy-free protein-level FDR estimation method. Under the null hypothesis that each candidate protein matches an identified peptide totally at random, we assign statistical significance to protein identifications in terms of the permutation P-value and use these P-values to calculate the FDR. Our method consists of three key steps: (i) generating random bipartite graphs with the same structure; (ii) calculating the protein scores on these random graphs; and (iii) calculating the permutation P value and final FDR. As it is time-consuming or prohibitive to execute the protein inference algorithms for thousands of times in step ii, we first train a linear regression model using the original bipartite graph and identification scores provided by the target inference algorithm. Then we use the learned regression model as a substitute of original protein inference method to predict protein scores on shuffled graphs. We test our method on six public available datasets. The results show that our method is comparable with those state-of-the-art algorithms in terms of estimation accuracy. AVAILABILITY The source code of our algorithm is available at: https://sourceforge.net/projects/plfdr/
Collapse
Affiliation(s)
- Ben Teng
- School of Software, Dalian University of Technology, Dalian 116621, China
| | | | | |
Collapse
|
30
|
Xiao CL, Chen XZ, Du YL, Li ZF, Wei L, Zhang G, He QY. Dispec: a novel peptide scoring algorithm based on peptide matching discriminability. PLoS One 2013; 8:e62724. [PMID: 23675420 PMCID: PMC3652849 DOI: 10.1371/journal.pone.0062724] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2012] [Accepted: 03/25/2013] [Indexed: 11/20/2022] Open
Abstract
Identifying peptides from the fragmentation spectra is a fundamental step in mass spectrometry (MS) data processing. The significance (discriminability) of every peak varies, providing additional information for potentially enhancing the identification sensitivity and the correct match rate. However this important information was not considered in previous algorithms. Here we presented a novel method based on Peptide Matching Discriminability (PMD), in which the PMD information of every peak reflects the discriminability of candidate peptides. In addition, we developed a novel peptide scoring algorithm Dispec based on PMD, by taking three aspects of discriminability into consideration: PMD, intensity discriminability and m/z error discriminability. Compared with Mascot and Sequest, Dispec identified remarkably more peptides from three experimental datasets with the same confidence at 1% PSM-level FDR. Dispec is also robust and versatile for various datasets obtained on different instruments. The concept of discriminability enhances the peptide identification and thus may contribute largely to the proteome studies. As an open-source program, Dispec is freely available at http://bioinformatics.jnu.edu.cn/software/dispec/.
Collapse
Affiliation(s)
- Chuan-Le Xiao
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes, Institute of Life and Health Engineering, College of Life Science and Technology, Jinan University, Guangzhou, China
| | - Xiao-Zhou Chen
- School of Mathematics and Computer Science, Yunnan University of Nationalities, Kunming, China
| | - Yang-Li Du
- School of Mathematics and Computer Science, Yunnan University of Nationalities, Kunming, China
| | - Zhe-Fu Li
- Jinan University Network and Educational Technology Center, Guangzhou, China
| | - Li Wei
- School of Mathematics and Computer Science, Yunnan University of Nationalities, Kunming, China
| | - Gong Zhang
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes, Institute of Life and Health Engineering, College of Life Science and Technology, Jinan University, Guangzhou, China
- * E-mail: (QYH); (GZ)
| | - Qing-Yu He
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes, Institute of Life and Health Engineering, College of Life Science and Technology, Jinan University, Guangzhou, China
- * E-mail: (QYH); (GZ)
| |
Collapse
|
31
|
Chung C, Emili A, Frey BJ. Non-parametric Bayesian approach to post-translational modification refinement of predictions from tandem mass spectrometry. Bioinformatics 2013; 29:821-9. [DOI: 10.1093/bioinformatics/btt056] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
32
|
Huang T, Gong H, Yang C, He Z. ProteinLasso: A Lasso regression approach to protein inference problem in shotgun proteomics. Comput Biol Chem 2013; 43:46-54. [PMID: 23385215 DOI: 10.1016/j.compbiolchem.2012.12.008] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2012] [Revised: 12/30/2012] [Accepted: 12/30/2012] [Indexed: 11/28/2022]
Abstract
Protein inference is an important issue in proteomics research. Its main objective is to select a proper subset of candidate proteins that best explain the observed peptides. Although many methods have been proposed for solving this problem, several issues such as peptide degeneracy and one-hit wonders still remain unsolved. Therefore, the accurate identification of proteins that are truly present in the sample continues to be a challenging task. Based on the concept of peptide detectability, we formulate the protein inference problem as a constrained Lasso regression problem, which can be solved very efficiently through a coordinate descent procedure. The new inference algorithm is named as ProteinLasso, which explores an ensemble learning strategy to address the sparsity parameter selection problem in Lasso model. We test the performance of ProteinLasso on three datasets. As shown in the experimental results, ProteinLasso outperforms those state-of-the-art protein inference algorithms in terms of both identification accuracy and running efficiency. In addition, we show that ProteinLasso is stable under different parameter specifications. The source code of our algorithm is available at: http://sourceforge.net/projects/proteinlasso.
Collapse
Affiliation(s)
- Ting Huang
- School of Software, Dalian University of Technology, China
| | | | | | | |
Collapse
|
33
|
Xiao CL, Chen XZ, Du YL, Sun X, Zhang G, He QY. Binomial Probability Distribution Model-Based Protein Identification Algorithm for Tandem Mass Spectrometry Utilizing Peak Intensity Information. J Proteome Res 2012; 12:328-35. [DOI: 10.1021/pr300781t] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Chuan-Le Xiao
- Institute of Life and Health
Engineering, Key Laboratory of Functional Protein Research of Guangdong
Higher Education Institutes, Jinan University, Guangzhou 510632, China
| | - Xiao-Zhou Chen
- School of Mathematics and Computer
Science, Yunnan University of Nationalities, Kunming 650031, China
| | - Yang-Li Du
- School of Mathematics and Computer
Science, Yunnan University of Nationalities, Kunming 650031, China
| | - Xuesong Sun
- Institute of Life and Health
Engineering, Key Laboratory of Functional Protein Research of Guangdong
Higher Education Institutes, Jinan University, Guangzhou 510632, China
| | - Gong Zhang
- Institute of Life and Health
Engineering, Key Laboratory of Functional Protein Research of Guangdong
Higher Education Institutes, Jinan University, Guangzhou 510632, China
| | - Qing-Yu He
- Institute of Life and Health
Engineering, Key Laboratory of Functional Protein Research of Guangdong
Higher Education Institutes, Jinan University, Guangzhou 510632, China
| |
Collapse
|
34
|
Abstract
Shotgun proteomics has recently emerged as a powerful approach to characterizing proteomes in biological samples. Its overall objective is to identify the form and quantity of each protein in a high-throughput manner by coupling liquid chromatography with tandem mass spectrometry. As a consequence of its high throughput nature, shotgun proteomics faces challenges with respect to the analysis and interpretation of experimental data. Among such challenges, the identification of proteins present in a sample has been recognized as an important computational task. This task generally consists of (1) assigning experimental tandem mass spectra to peptides derived from a protein database, and (2) mapping assigned peptides to proteins and quantifying the confidence of identified proteins. Protein identification is fundamentally a statistical inference problem with a number of methods proposed to address its challenges. In this review we categorize current approaches into rule-based, combinatorial optimization and probabilistic inference techniques, and present them using integer programming and Bayesian inference frameworks. We also discuss the main challenges of protein identification and propose potential solutions with the goal of spurring innovative research in this area.
Collapse
Affiliation(s)
- Yong Fuga Li
- School of Informatics and Computing, Indiana University, Bloomington 150 S, Woodlawn Avenue, Bloomington, Indiana 47405, USA
| | | |
Collapse
|
35
|
Huang T, He Z. A linear programming model for protein inference problem in shotgun proteomics. ACTA ACUST UNITED AC 2012; 28:2956-62. [PMID: 22954624 DOI: 10.1093/bioinformatics/bts540] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
MOTIVATION Assembling peptides identified from tandem mass spectra into a list of proteins, referred to as protein inference, is an important issue in shotgun proteomics. The objective of protein inference is to find a subset of proteins that are truly present in the sample. Although many methods have been proposed for protein inference, several issues such as peptide degeneracy still remain unsolved. RESULTS In this article, we present a linear programming model for protein inference. In this model, we use a transformation of the joint probability that each peptide/protein pair is present in the sample as the variable. Then, both the peptide probability and protein probability can be expressed as a formula in terms of the linear combination of these variables. Based on this simple fact, the protein inference problem is formulated as an optimization problem: minimize the number of proteins with non-zero probabilities under the constraint that the difference between the calculated peptide probability and the peptide probability generated from peptide identification algorithms should be less than some threshold. This model addresses the peptide degeneracy issue by forcing some joint probability variables involving degenerate peptides to be zero in a rigorous manner. The corresponding inference algorithm is named as ProteinLP. We test the performance of ProteinLP on six datasets. Experimental results show that our method is competitive with the state-of-the-art protein inference algorithms. AVAILABILITY The source code of our algorithm is available at: https://sourceforge.net/projects/prolp/. CONTACT zyhe@dlut.edu.cn. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics Online.
Collapse
Affiliation(s)
- Ting Huang
- School of Software, Dalian University of Technology, Dalian 116621, China
| | | |
Collapse
|
36
|
Vogel C, Marcotte EM. Insights into the regulation of protein abundance from proteomic and transcriptomic analyses. Nat Rev Genet 2012; 13:227-32. [PMID: 22411467 DOI: 10.1038/nrg3185] [Citation(s) in RCA: 2663] [Impact Index Per Article: 221.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Recent advances in next-generation DNA sequencing and proteomics provide an unprecedented ability to survey mRNA and protein abundances. Such proteome-wide surveys are illuminating the extent to which different aspects of gene expression help to regulate cellular protein abundances. Current data demonstrate a substantial role for regulatory processes occurring after mRNA is made - that is, post-transcriptional, translational and protein degradation regulation - in controlling steady-state protein abundances. Intriguing observations are also emerging in relation to cells following perturbation, single-cell studies and the apparent evolutionary conservation of protein and mRNA abundances. Here, we summarize current understanding of the major factors regulating protein expression.
Collapse
Affiliation(s)
- Christine Vogel
- Center for Genomics and Systems Biology, New York University, New York 10003, USA.
| | | |
Collapse
|
37
|
|
38
|
Choi H, Pavelka N. When one and one gives more than two: challenges and opportunities of integrative omics. Front Genet 2012; 2:105. [PMID: 22303399 PMCID: PMC3262227 DOI: 10.3389/fgene.2011.00105] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2011] [Accepted: 12/21/2011] [Indexed: 12/24/2022] Open
Abstract
Since the dawn of the post-genomic era a myriad of novel high-throughput technologies have been developed that are capable of measuring thousands of biological molecules at once, giving rise to various “omics” platforms. These advances offer the unique opportunity to study how individual parts of a biological system work together to produce emerging phenotypes. Today, many research laboratories are moving toward applying multiple omics platforms to analyze the same biological samples. In addition, network information of interacting molecules is being incorporated more and more into the analysis and interpretation of these multiple omics datasets, which provides novel ways to integrate multiple layers of heterogeneous biological information into a single coherent picture. Here, we provide a perspective on how such recent “integrative omics” efforts are likely going to shift biological paradigms once again, and what challenges lie ahead.
Collapse
Affiliation(s)
- Hyungwon Choi
- Saw Swee Hock School of Public Health, National University of Singapore Singapore
| | | |
Collapse
|
39
|
Wang X, Slebos RJC, Wang D, Halvey PJ, Tabb DL, Liebler DC, Zhang B. Protein identification using customized protein sequence databases derived from RNA-Seq data. J Proteome Res 2011; 11:1009-17. [PMID: 22103967 DOI: 10.1021/pr200766z] [Citation(s) in RCA: 132] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The standard shotgun proteomics data analysis strategy relies on searching MS/MS spectra against a context-independent protein sequence database derived from the complete genome sequence of an organism. Because transcriptome sequence analysis (RNA-Seq) promises an unbiased and comprehensive picture of the transcriptome, we reason that a sample-specific protein database derived from RNA-Seq data can better approximate the real protein pool in the sample and thus improve protein identification. In this study, we have developed a two-step strategy for building sample-specific protein databases from RNA-Seq data. First, the database size is reduced by eliminating unexpressed or lowly expressed genes according to transcript quantification. Second, high-quality nonsynonymous coding single nucleotide variations (SNVs) are identified based on RNA-Seq data, and corresponding protein variants are added to the database. Using RNA-Seq and shotgun proteomics data from two colorectal cancer cell lines SW480 and RKO, we demonstrated that customized protein sequence databases could significantly increase the sensitivity of peptide identification, reduce ambiguity in protein assembly, and enable the detection of known and novel peptide variants. Thus, sample-specific databases from RNA-Seq data can enable more sensitive and comprehensive protein discovery in shotgun proteomics studies.
Collapse
Affiliation(s)
- Xiaojing Wang
- Department of Biomedical Informatics,Vanderbilt University School of Medicine , Nashville, Tennessee 37232, United States
| | | | | | | | | | | | | |
Collapse
|
40
|
Nan X, Fu G, Zhao Z, Liu S, Patel RY, Liu H, Daga PR, Doerksen RJ, Dang X, Chen Y, Wilkins D. Leveraging domain information to restructure biological prediction. BMC Bioinformatics 2011; 12 Suppl 10:S22. [PMID: 22166097 PMCID: PMC3236845 DOI: 10.1186/1471-2105-12-s10-s22] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background It is commonly believed that including domain knowledge in a prediction model is desirable. However, representing and incorporating domain information in the learning process is, in general, a challenging problem. In this research, we consider domain information encoded by discrete or categorical attributes. A discrete or categorical attribute provides a natural partition of the problem domain, and hence divides the original problem into several non-overlapping sub-problems. In this sense, the domain information is useful if the partition simplifies the learning task. The goal of this research is to develop an algorithm to identify discrete or categorical attributes that maximally simplify the learning task. Results We consider restructuring a supervised learning problem via a partition of the problem space using a discrete or categorical attribute. A naive approach exhaustively searches all the possible restructured problems. It is computationally prohibitive when the number of discrete or categorical attributes is large. We propose a metric to rank attributes according to their potential to reduce the uncertainty of a classification task. It is quantified as a conditional entropy achieved using a set of optimal classifiers, each of which is built for a sub-problem defined by the attribute under consideration. To avoid high computational cost, we approximate the solution by the expected minimum conditional entropy with respect to random projections. This approach is tested on three artificial data sets, three cheminformatics data sets, and two leukemia gene expression data sets. Empirical results demonstrate that our method is capable of selecting a proper discrete or categorical attribute to simplify the problem, i.e., the performance of the classifier built for the restructured problem always beats that of the original problem. Conclusions The proposed conditional entropy based metric is effective in identifying good partitions of a classification problem, hence enhancing the prediction performance.
Collapse
Affiliation(s)
- Xiaofei Nan
- Department of Computer and Information Science, University of Mississippi, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
41
|
Kwon T, Choi H, Vogel C, Nesvizhskii AI, Marcotte EM. MSblender: A probabilistic approach for integrating peptide identifications from multiple database search engines. J Proteome Res 2011; 10:2949-58. [PMID: 21488652 DOI: 10.1021/pr2002116] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Shotgun proteomics using mass spectrometry is a powerful method for protein identification but suffers limited sensitivity in complex samples. Integrating peptide identifications from multiple database search engines is a promising strategy to increase the number of peptide identifications and reduce the volume of unassigned tandem mass spectra. Existing methods pool statistical significance scores such as p-values or posterior probabilities of peptide-spectrum matches (PSMs) from multiple search engines after high scoring peptides have been assigned to spectra, but these methods lack reliable control of identification error rates as data are integrated from different search engines. We developed a statistically coherent method for integrative analysis, termed MSblender. MSblender converts raw search scores from search engines into a probability score for every possible PSM and properly accounts for the correlation between search scores. The method reliably estimates false discovery rates and identifies more PSMs than any single search engine at the same false discovery rate. Increased identifications increment spectral counts for most proteins and allow quantification of proteins that would not have been quantified by individual search engines. We also demonstrate that enhanced quantification contributes to improve sensitivity in differential expression analyses.
Collapse
Affiliation(s)
- Taejoon Kwon
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, Texas, USA
| | | | | | | | | |
Collapse
|
42
|
Vogel C, Abreu RDS, Ko D, Le SY, Shapiro BA, Burns SC, Sandhu D, Boutz DR, Marcotte EM, Penalva LO. Sequence signatures and mRNA concentration can explain two-thirds of protein abundance variation in a human cell line. Mol Syst Biol 2011; 6:400. [PMID: 20739923 PMCID: PMC2947365 DOI: 10.1038/msb.2010.59] [Citation(s) in RCA: 458] [Impact Index Per Article: 35.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2010] [Accepted: 06/29/2010] [Indexed: 11/23/2022] Open
Abstract
We provide a large-scale dataset on absolute protein and matching mRNA concentrations from the human medulloblastoma cell line Daoy. The correlation between mRNA and protein concentrations is significant and positive (Rs=0.46, R2=0.29, P-value<2e16), although non-linear. Out of ∼200 tested sequence features, sequence length, frequency and properties of amino acids, as well as translation initiation-related features are the strongest individual correlates of protein abundance when accounting for variation in mRNA concentration. When integrating mRNA expression data and all sequence features into a non-parametric regression model (Multivariate Adaptive Regression Splines), we were able to explain up to 67% of the variation in protein concentrations. Half of the contributions were attributed to mRNA concentrations, the other half to sequence features relating to regulation of translation and protein degradation. The sequence features are primarily linked to the coding and 3′ untranslated region. To our knowledge, this is the most comprehensive predictive model of human protein concentrations achieved so far.
mRNA decay, translation regulation and protein degradation are essential parts of eukaryotic gene expression regulation (Hieronymus and Silver, 2004; Mata et al, 2005), which enable the dynamics of cellular systems and their responses to external and internal stimuli without having to rely exclusively on transcription regulation. The importance of these processes is emphasized by the generally low correlation between mRNA and protein concentrations. For many prokaryotic and eukaryotic organisms, <50% of variation in protein abundance variation is explained by variation in mRNA concentrations (de Sousa Abreu et al, 2009). Given the plethora of regulatory mechanisms involved, most studies have focused so far on individual regulators and specific targets. Particularly in human, we currently lack system-wide, quantitative analyses that evaluate the relative contribution of regulatory elements encoded in the mRNA and protein sequence. Existing studies have been carried out only in bacteria and yeast (Nie et al, 2006; Brockmann et al, 2007; Tuller et al, 2007; Wu et al, 2008). Here, we present the first comprehensive analysis on the impact of translation and protein degradation on protein abundance variation in a human cell line. For this purpose, we experimentally measured absolute protein and mRNA concentrations in the Daoy medulloblastoma cell line, using shotgun proteomics and microarrays, respectively (Figure 1). These data comprise one of the largest such sets available today for human. We focused on sequence features that likely impact protein translation and protein degradation, including length, nucleotide composition, structure of the untranslated regions (UTRs), coding sequence, composition of the translation initiation site, presence of upstream open reading frames putative target sites of miRNAs, codon usage, amino-acid composition and protein degradation signals. Three types of tests have been conducted: (a) we examined partial Spearman's rank correlation of numerical features (e.g. length) with protein concentration, accounting for variation in mRNA concentrations; (b) for numerical and categorical features (e.g. function), we compared two extreme populations with Welch's t-test and (c) using a Multivariate Adaptive Regression Splines model, we analyzed the combined contributions of mRNA expression and sequence features to protein abundance variation (Figure 1). To account for the non-linearity of many relationships, we use non-parametric approaches throughout the analysis. We observed a significant positive correlation between mRNA and protein concentrations, larger than many previous measurements (de Sousa Abreu et al, 2009). We also show that the contribution of translation and protein degradation is at least as important as the contribution of mRNA transcription and stability to the abundance variation of the final protein products. Although variation in mRNA expression explains ∼25–30% of the variation in protein abundance, another 30–40% can be accounted for by characteristics of the sequences, which we identified in a comparative assessment of global correlates. Among these characteristics, sequence length, amino-acid frequencies and also nucleotide frequencies in the coding region are of strong influence (Figure 3A). Characteristics of the 3′UTR and of the 5′UTR, that is length, nucleotide composition and secondary structures, describe another part of the variation, leaving 33% expression variation unexplained. The unexplained fraction may be accounted for by mechanisms not considered in this analysis (e.g. regulation by RNA-binding proteins or gene-specific structural motifs), as well as expression and measurement noise. Our combined model including mRNA concentration and sequence features can explain 67% of the variation of protein abundance in this system—and thus has the highest predictive power for human protein abundance achieved so far (Figure 3B). Transcription, mRNA decay, translation and protein degradation are essential processes during eukaryotic gene expression, but their relative global contributions to steady-state protein concentrations in multi-cellular eukaryotes are largely unknown. Using measurements of absolute protein and mRNA abundances in cellular lysate from the human Daoy medulloblastoma cell line, we quantitatively evaluate the impact of mRNA concentration and sequence features implicated in translation and protein degradation on protein expression. Sequence features related to translation and protein degradation have an impact similar to that of mRNA abundance, and their combined contribution explains two-thirds of protein abundance variation. mRNA sequence lengths, amino-acid properties, upstream open reading frames and secondary structures in the 5′ untranslated region (UTR) were the strongest individual correlates of protein concentrations. In a combined model, characteristics of the coding region and the 3′UTR explained a larger proportion of protein abundance variation than characteristics of the 5′UTR. The absolute protein and mRNA concentration measurements for >1000 human genes described here represent one of the largest datasets currently available, and reveal both general trends and specific examples of post-transcriptional regulation.
Collapse
Affiliation(s)
- Christine Vogel
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas, Austin, TX 78229-3900, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
43
|
Veremieva M, Khoruzhenko A, Zaicev S, Negrutskii B, El'skaya A. Unbalanced expression of the translation complex eEF1 subunits in human cardioesophageal carcinoma. Eur J Clin Invest 2011; 41:269-76. [PMID: 20964681 DOI: 10.1111/j.1365-2362.2010.02404.x] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
BACKGROUND The signalling role of individual subunits released from some stable translation multi-molecular complexes under unfavourable circumstances is known. The disease-related role of the translation elongation factor 1 complex (eEF1) as a whole is never researched; however, its subunits possess apparent regulatory potency. Whether the individual eEF1 subunits can exist and function in cell beyond the complex is not known. MATERIALS AND METHODS The protein and mRNA levels of the A1, Bα, Bβ or Bγ subunits of eEF1 were analysed by Western and Northern blot techniques in the same specimens of cardioesophageal carcinoma and correspondingly paired normal tissues. Cancer-induced changes in localization patterns of the eEF1 subunits were examined immunohistochemically. RESULTS Changes in different eEF1 subunits expression were found to be unbalanced, indicating cancer-related emergence of individual components of the eEF1 complex. Independent overexpression of at least one eEF1 component was observed in 72% clinical samples. Noncomplexed eEF1B subunits were also detected by immunohistochemical analysis. In the normal tissue, localization of the Bα, Bβ and Bγ subunits was nuclear-cytoplasmic while in the cancer tissue the only Bγ subunit stayed in nucleus. CONCLUSIONS Our data are first to indicate that the individual subunits can exist separately from the eEF1B complex in cancer tissues and that disintegration of eEF1B could be an important sign of cancer development. Nuclear localization of Bγ both in normal and in cancer tissues suggests its previously unknown nucleus-specific role in human cells.
Collapse
Affiliation(s)
- Marina Veremieva
- Institute of Molecular Biology and Genetics, National Academy of Sciences of Ukraine, Kiev, Ukraine
| | | | | | | | | |
Collapse
|
44
|
Torres-García W, Brown SD, Johnson RH, Zhang W, Runger GC, Meldrum DR. Integrative analysis of transcriptomic and proteomic data of Shewanella oneidensis: missing value imputation using temporal datasets. MOLECULAR BIOSYSTEMS 2011; 7:1093-104. [PMID: 21212895 DOI: 10.1039/c0mb00260g] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Despite significant improvements in recent years, proteomic datasets currently available still suffer from large number of missing values. Integrative analyses based upon incomplete proteomic and transcriptomic datasets could seriously bias the biological interpretation. In this study, we applied a non-linear data-driven stochastic gradient boosted trees (GBT) model to impute missing proteomic values using a temporal transcriptomic and proteomic dataset of Shewanella oneidensis. In this dataset, genes' expression was measured after the cells were exposed to 1 mM potassium chromate for 5, 30, 60, and 90 min, while protein abundance was measured for 45 and 90 min. With the ultimate objective to impute protein values for experimentally undetected samples at 45 and 90 min, we applied a serial set of algorithms to capture relationships between temporal gene and protein expression. This work follows four main steps: (1) a quality control step for gene expression reliability, (2) mRNA imputation, (3) protein prediction, and (4) validation. Initially, an S control chart approach is performed on gene expression replicates to remove unwanted variability. Then, we focused on the missing measurements of gene expression through a nonlinear Smoothing Splines Curve Fitting. This method identifies temporal relationships among transcriptomic data at different time points and enables imputation of mRNA abundance at 45 min. After mRNA imputation was validated by biological constrains (i.e. operons), we used a data-driven GBT model to impute protein abundance for the proteins experimentally undetected in the 45 and 90 min samples, based on relevant predictors such as temporal mRNA gene expression data and cellular functional roles. The imputed protein values were validated using biological constraints such as operon and pathway information through a permutation test to investigate whether dispersion measures are indeed smaller for known biological groups than for any set of random genes. Finally, we demonstrated that such missing value imputation improved characterization of the temporal response of S. oneidensis to chromate.
Collapse
Affiliation(s)
- Wandaliz Torres-García
- School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, Tempe, AZ 85287-5906, USA.
| | | | | | | | | | | |
Collapse
|
45
|
Schrimpf SP, Hengartner MO. A worm rich in protein: Quantitative, differential, and global proteomics in Caenorhabditis elegans. J Proteomics 2010; 73:2186-97. [DOI: 10.1016/j.jprot.2010.03.014] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2010] [Accepted: 03/29/2010] [Indexed: 12/26/2022]
|
46
|
Nesvizhskii AI. A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. J Proteomics 2010; 73:2092-123. [PMID: 20816881 DOI: 10.1016/j.jprot.2010.08.009] [Citation(s) in RCA: 358] [Impact Index Per Article: 25.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2010] [Revised: 08/25/2010] [Accepted: 08/25/2010] [Indexed: 12/18/2022]
Abstract
This manuscript provides a comprehensive review of the peptide and protein identification process using tandem mass spectrometry (MS/MS) data generated in shotgun proteomic experiments. The commonly used methods for assigning peptide sequences to MS/MS spectra are critically discussed and compared, from basic strategies to advanced multi-stage approaches. A particular attention is paid to the problem of false-positive identifications. Existing statistical approaches for assessing the significance of peptide to spectrum matches are surveyed, ranging from single-spectrum approaches such as expectation values to global error rate estimation procedures such as false discovery rates and posterior probabilities. The importance of using auxiliary discriminant information (mass accuracy, peptide separation coordinates, digestion properties, and etc.) is discussed, and advanced computational approaches for joint modeling of multiple sources of information are presented. This review also includes a detailed analysis of the issues affecting the interpretation of data at the protein level, including the amplification of error rates when going from peptide to protein level, and the ambiguities in inferring the identifies of sample proteins in the presence of shared peptides. Commonly used methods for computing protein-level confidence scores are discussed in detail. The review concludes with a discussion of several outstanding computational issues.
Collapse
|
47
|
Protein and gene model inference based on statistical modeling in k-partite graphs. Proc Natl Acad Sci U S A 2010; 107:12101-6. [PMID: 20562346 DOI: 10.1073/pnas.0907654107] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
One of the major goals of proteomics is the comprehensive and accurate description of a proteome. Shotgun proteomics, the method of choice for the analysis of complex protein mixtures, requires that experimentally observed peptides are mapped back to the proteins they were derived from. This process is also known as protein inference. We present Markovian Inference of Proteins and Gene Models (MIPGEM), a statistical model based on clearly stated assumptions to address the problem of protein and gene model inference for shotgun proteomics data. In particular, we are dealing with dependencies among peptides and proteins using a Markovian assumption on k-partite graphs. We are also addressing the problems of shared peptides and ambiguous proteins by scoring the encoding gene models. Empirical results on two control datasets with synthetic mixtures of proteins and on complex protein samples of Saccharomyces cerevisiae, Drosophila melanogaster, and Arabidopsis thaliana suggest that the results with MIPGEM are competitive with existing tools for protein inference.
Collapse
|
48
|
Fox JM, Erill I. Relative codon adaptation: a generic codon bias index for prediction of gene expression. DNA Res 2010; 17:185-96. [PMID: 20453079 PMCID: PMC2885275 DOI: 10.1093/dnares/dsq012] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The development of codon bias indices (CBIs) remains an active field of research due to their myriad applications in computational biology. Recently, the relative codon usage bias (RCBS) was introduced as a novel CBI able to estimate codon bias without using a reference set. The results of this new index when applied to Escherichia coli and Saccharomyces cerevisiae led the authors of the original publications to conclude that natural selection favours higher expression and enhanced codon usage optimization in short genes. Here, we show that this conclusion was flawed and based on the systematic oversight of an intrinsic bias for short sequences in the RCBS index and of biases in the small data sets used for validation in E. coli. Furthermore, we reveal that how the RCBS can be corrected to produce useful results and how its underlying principle, which we here term relative codon adaptation (RCA), can be made into a powerful reference-set-based index that directly takes into account the genomic base composition. Finally, we show that RCA outperforms the codon adaptation index (CAI) as a predictor of gene expression when operating on the CAI reference set and that this improvement is significantly larger when analysing genomes with high mutational bias.
Collapse
Affiliation(s)
- Jesse M Fox
- Department of Biological Sciences, University of Maryland Baltimore County (UMBC), 1000 Hilltop Road, Baltimore, MD 21228, USA
| | | |
Collapse
|
49
|
Maier T, Güell M, Serrano L. Correlation of mRNA and protein in complex biological samples. FEBS Lett 2010; 583:3966-73. [PMID: 19850042 DOI: 10.1016/j.febslet.2009.10.036] [Citation(s) in RCA: 1235] [Impact Index Per Article: 88.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2009] [Revised: 10/09/2009] [Accepted: 10/14/2009] [Indexed: 01/12/2023]
Abstract
The correlation between mRNA and protein abundances in the cell has been reported to be notoriously poor. Recent technological advances in the quantitative analysis of mRNA and protein species in complex samples allow the detailed analysis of this pathway at the center of biological systems. We give an overview of available methods for the identification and quantification of free and ribosome-bound mRNA, protein abundances and individual protein turnover rates. We review available literature on the correlation of mRNA and protein abundances and discuss biological and technical parameters influencing the correlation of these central biological molecules.
Collapse
Affiliation(s)
- Tobias Maier
- Center for Genomic Regulation, Barcelona, Spain.
| | | | | |
Collapse
|
50
|
Ochs MF. Knowledge-based data analysis comes of age. Brief Bioinform 2010; 11:30-9. [PMID: 19854753 PMCID: PMC3700349 DOI: 10.1093/bib/bbp044] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2009] [Revised: 09/03/2009] [Indexed: 12/16/2022] Open
Abstract
The emergence of high-throughput technologies for measuring biological systems has introduced problems for data interpretation that must be addressed for proper inference. First, analysis techniques need to be matched to the biological system, reflecting in their mathematical structure the underlying behavior being studied. When this is not done, mathematical techniques will generate answers, but the values and reliability estimates may not accurately reflect the biology. Second, analysis approaches must address the vast excess in variables measured (e.g. transcript levels of genes) over the number of samples (e.g. tumors, time points), known as the 'large-p, small-n' problem. In large-p, small-n paradigms, standard statistical techniques generally fail, and computational learning algorithms are prone to overfit the data. Here we review the emergence of techniques that match mathematical structure to the biology, the use of integrated data and prior knowledge to guide statistical analysis, and the recent emergence of analysis approaches utilizing simple biological models. We show that novel biological insights have been gained using these techniques.
Collapse
Affiliation(s)
- Michael F Ochs
- Division of Oncology Biostatistics and Bioinformatics, 550 North Broadway, Suite 1103, Johns Hopkins University, Baltimore, MD 21205, USA.
| |
Collapse
|