Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Ramakrishnan SR, Vogel C, Prince JT, Li Z, Penalva LO, Myers M, Marcotte EM, Miranker DP, Wang R. Integrating shotgun proteomics and mRNA expression data to improve protein identification. Bioinformatics 2009;25:1397-403. [PMID: 19318424 PMCID: PMC2682515 DOI: 10.1093/bioinformatics/btp168] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2008] [Revised: 02/19/2009] [Accepted: 03/18/2009] [Indexed: 01/24/2023] Open

For:	Ramakrishnan SR, Vogel C, Prince JT, Li Z, Penalva LO, Myers M, Marcotte EM, Miranker DP, Wang R. Integrating shotgun proteomics and mRNA expression data to improve protein identification. Bioinformatics 2009;25:1397-403. [PMID: 19318424 PMCID: PMC2682515 DOI: 10.1093/bioinformatics/btp168] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2008] [Revised: 02/19/2009] [Accepted: 03/18/2009] [Indexed: 01/24/2023] Open

Number

Cited by Other Article(s)

BOLLON JORDY, SHORTREED MICHAELR, JORDAN BENT, MILLER RACHEL, JEFFERY ERIN, CAVALLI ANDREA, SMITH LLOYDM, DEWEY COLIN, SHEYNKMAN GLORIAM, TIBERI SIMONE. IsoBayes: a Bayesian approach for single-isoform proteomics inference. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.10.598223. [PMID: 38915658 PMCID: PMC11195044 DOI: 10.1101/2024.06.10.598223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/26/2024]

Abstract

Studying protein isoforms is an essential step in biomedical research; at present, the main approach for analyzing proteins is via bottom-up mass spectrometry proteomics, which return peptide identifications, that are indirectly used to infer the presence of protein isoforms. However, the detection and quantification processes are noisy; in particular, peptides may be erroneously detected, and most peptides, known as shared peptides, are associated to multiple protein isoforms. As a consequence, studying individual protein isoforms is challenging, and inferred protein results are often abstracted to the gene-level or to groups of protein isoforms. Here, we introduce IsoBayes, a novel statistical method to perform inference at the isoform level. Our method enhances the information available, by integrating mass spectrometry proteomics and transcriptomics data in a Bayesian probabilistic framework. To account for the uncertainty in the measurement process, we propose a two-layer latent variable approach: first, we sample if a peptide has been correctly detected (or, alternatively filter peptides); second, we allocate the abundance of such selected peptides across the protein(s) they are compatible with. This enables us, starting from peptide-level data, to recover protein-level data; in particular, we: i) infer the presence/absence of each protein isoform (via a posterior probability), ii) estimate its abundance (and credible interval), and iii) target isoforms where transcript and protein relative abundances significantly differ. We benchmarked our approach in simulations, and in two multi-protease real datasets: our method displays good sensitivity and specificity when detecting protein isoforms, its estimated abundances highly correlate with the ground truth, and can detect changes between protein and transcript relative abundances. IsoBayes is freely distributed as a Bioconductor R package, and is accompanied by an example usage vignette.

Collapse

Bishop DJ, Hoffman NJ, Taylor DF, Saner NJ, Lee MJC, Hawley JA. Discordant skeletal muscle gene and protein responses to exercise. Trends Biochem Sci 2023;48:927-936. [PMID: 37709636 DOI: 10.1016/j.tibs.2023.08.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2023] [Revised: 08/07/2023] [Accepted: 08/16/2023] [Indexed: 09/16/2023]

Feng S, Ji HL, Wang H, Zhang B, Sterzenbach R, Pan C, Guo X. MetaLP: An integrative linear programming method for protein inference in metaproteomics. PLoS Comput Biol 2022;18:e1010603. [PMID: 36269761 PMCID: PMC9629623 DOI: 10.1371/journal.pcbi.1010603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2022] [Revised: 11/02/2022] [Accepted: 09/26/2022] [Indexed: 11/07/2022] Open

Fancello L, Burger T. An analysis of proteogenomics and how and when transcriptome-informed reduction of protein databases can enhance eukaryotic proteomics. Genome Biol 2022;23:132. [PMID: 35725496 PMCID: PMC9208142 DOI: 10.1186/s13059-022-02701-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Accepted: 06/09/2022] [Indexed: 12/03/2022] Open

Abstract

Background

Proteogenomics aims to identify variant or unknown proteins in bottom-up proteomics, by searching transcriptome- or genome-derived custom protein databases. However, empirical observations reveal that these large proteogenomic databases produce lower-sensitivity peptide identifications. Various strategies have been proposed to avoid this, including the generation of reduced transcriptome-informed protein databases, which only contain proteins whose transcripts are detected in the sample-matched transcriptome. These were found to increase peptide identification sensitivity. Here, we present a detailed evaluation of this approach.

Results

We establish that the increased sensitivity in peptide identification is in fact a statistical artifact, directly resulting from the limited capability of target-decoy competition to accurately model incorrect target matches when using excessively small databases. As anti-conservative false discovery rates (FDRs) are likely to hamper the robustness of the resulting biological conclusions, we advocate for alternative FDR control methods that are less sensitive to database size. Nevertheless, reduced transcriptome-informed databases are useful, as they reduce the ambiguity of protein identifications, yielding fewer shared peptides. Furthermore, searching the reference database and subsequently filtering proteins whose transcripts are not expressed reduces protein identification ambiguity to a similar extent, but is more transparent and reproducible.

Conclusions

In summary, using transcriptome information is an interesting strategy that has not been promoted for the right reasons. While the increase in peptide identifications from searching reduced transcriptome-informed databases is an artifact caused by the use of an FDR control method unsuitable to excessively small databases, transcriptome information can reduce the ambiguity of protein identifications.

Supplementary Information

The online version contains supplementary material available at 10.1186/s13059-022-02701-2.

Collapse

Liu W, Liu Q, Zhang B, Lin Z, Li X, Yang X, Pu M, Zou R, He Z, Wang F, Dou K. The mRNA of TCTP functions as a sponge to maintain homeostasis of TCTP protein levels in hepatocellular carcinoma. Cell Death Dis 2020;11:974. [PMID: 33184257 PMCID: PMC7665032 DOI: 10.1038/s41419-020-03149-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2020] [Revised: 10/10/2020] [Accepted: 10/13/2020] [Indexed: 01/01/2023]

Lau E, Han Y, Williams DR, Thomas CT, Shrestha R, Wu JC, Lam MPY. Splice-Junction-Based Mapping of Alternative Isoforms in the Human Proteome. Cell Rep 2020;29:3751-3765.e5. [PMID: 31825849 PMCID: PMC6961840 DOI: 10.1016/j.celrep.2019.11.026] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2019] [Revised: 09/24/2019] [Accepted: 11/06/2019] [Indexed: 12/18/2022] Open

Prieto G, Vázquez J. Protein Probability Model for High-Throughput Protein Identification by Mass Spectrometry-Based Proteomics. J Proteome Res 2020;19:1285-1297. [PMID: 32037837 DOI: 10.1021/acs.jproteome.9b00819] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Du Y, Clair GC, Al Alam D, Danopoulos S, Schnell D, Kitzmiller JA, Misra RS, Bhattacharya S, Warburton D, Mariani TJ, Pryhuber GS, Whitsett JA, Ansong C, Xu Y. Integration of transcriptomic and proteomic data identifies biological functions in cell populations from human infant lung. Am J Physiol Lung Cell Mol Physiol 2019;317:L347-L360. [PMID: 31268347 DOI: 10.1152/ajplung.00475.2018] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open

Affiliation(s)

Yina Du The Perinatal Institute and Section of Neonatology, Perinatal and Pulmonary Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio
Geremy C Clair Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington
Denise Al Alam Developmental Biology and Regenerative Medicine Program, Department of Pediatric Surgery, The Saban Research Institute, Children's Hospital Los Angeles, Los Angeles, California.,Keck School of Medicine, University of Southern California, Los Angeles, California
Soula Danopoulos Developmental Biology and Regenerative Medicine Program, Department of Pediatric Surgery, The Saban Research Institute, Children's Hospital Los Angeles, Los Angeles, California.,Keck School of Medicine, University of Southern California, Los Angeles, California
Daniel Schnell Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio.,Heart Institute and Center for Translational Fibrosis Research, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio
Joseph A Kitzmiller The Perinatal Institute and Section of Neonatology, Perinatal and Pulmonary Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio
Ravi S Misra Department of Pediatrics, University of Rochester Medical Center, Rochester, New York
Soumyaroop Bhattacharya Department of Pediatrics, University of Rochester Medical Center, Rochester, New York.,Division of Neonatology and Program in Pediatric Molecular and Personalized Medicine, University of Rochester Medical Center, Rochester, New York
David Warburton Developmental Biology and Regenerative Medicine Program, Department of Pediatric Surgery, The Saban Research Institute, Children's Hospital Los Angeles, Los Angeles, California.,Keck School of Medicine, University of Southern California, Los Angeles, California
Thomas J Mariani Department of Pediatrics, University of Rochester Medical Center, Rochester, New York.,Division of Neonatology and Program in Pediatric Molecular and Personalized Medicine, University of Rochester Medical Center, Rochester, New York
Gloria S Pryhuber Department of Pediatrics, University of Rochester Medical Center, Rochester, New York
Jeffrey A Whitsett The Perinatal Institute and Section of Neonatology, Perinatal and Pulmonary Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio
Charles Ansong Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington
Yan Xu The Perinatal Institute and Section of Neonatology, Perinatal and Pulmonary Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio.,Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio

Collapse

Zelic R, Fiano V, Ebot EM, Coseo Markt S, Grasso C, Trevisan M, De Marco L, Delsedime L, Zugna D, Mucci LA, Richiardi L. Single-nucleotide polymorphisms in DNMT3B gene and DNMT3B mRNA expression in association with prostate cancer mortality. Prostate Cancer Prostatic Dis 2019;22:284-291. [PMID: 30341411 DOI: 10.1038/s41391-018-0102-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2018] [Revised: 09/04/2018] [Accepted: 09/08/2018] [Indexed: 01/02/2023]

Hibbert SA, Ozols M, Griffiths CEM, Watson REB, Bell M, Sherratt MJ. Defining tissue proteomes by systematic literature review. Sci Rep 2018;8:546. [PMID: 29323144 PMCID: PMC5765030 DOI: 10.1038/s41598-017-18699-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2017] [Accepted: 12/14/2017] [Indexed: 12/24/2022] Open

Zhong J, Wang J, Ding X, Zhang Z, Li M, Wu FX, Pan Y. Protein Inference from the Integration of Tandem MS Data and Interactome Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017;14:1399-1409. [PMID: 28113634 DOI: 10.1109/tcbb.2016.2601618] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]

Abstract

Since proteins are digested into a mixture of peptides in the preprocessing step of tandem mass spectrometry (MS), it is difficult to determine which specific protein a shared peptide belongs to. In recent studies, besides tandem MS data and peptide identification information, some other information is exploited to infer proteins. Different from the methods which first use only tandem MS data to infer proteins and then use network information to refine them, this study proposes a protein inference method named TMSIN, which uses interactome networks directly. As two interacting proteins should co-exist, it is reasonable to assume that if one of the interacting proteins is confidently inferred in a sample, its interacting partners should have a high probability in the same sample, too. Therefore, we can use the neighborhood information of a protein in an interactome network to adjust the probability that the shared peptide belongs to the protein. In TMSIN, a multi-weighted graph is constructed by incorporating the bipartite graph with interactome network information, where the bipartite graph is built with the peptide identification information. Based on multi-weighted graphs, TMSIN adopts an iterative workflow to infer proteins. At each iterative step, the probability that a shared peptide belongs to a specific protein is calculated by using the Bayes' law based on the neighbor protein support scores of each protein which are mapped by the shared peptides. We carried out experiments on yeast data and human data to evaluate the performance of TMSIN in terms of ROC, q-value, and accuracy. The experimental results show that AUC scores yielded by TMSIN are 0.742 and 0.874 in yeast dataset and human dataset, respectively, and TMSIN yields the maximum number of true positives when q-value less than or equal to 0.05. The overlap analysis shows that TMSIN is an effective complementary approach for protein inference.

Collapse

Tsai MA, Chen IH, Wang JH, Chou SJ, Li TH, Leu MY, Ho HK, Yang WC. A probe-based qRT-PCR method to profile immunological gene expression in blood of captive beluga whales (Delphinapterus leucas). PeerJ 2017;5:e3840. [PMID: 28970970 PMCID: PMC5622604 DOI: 10.7717/peerj.3840] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2017] [Accepted: 09/01/2017] [Indexed: 12/20/2022] Open

Langella O, Valot B, Balliau T, Blein-Nicolas M, Bonhomme L, Zivy M. X!TandemPipeline: A Tool to Manage Sequence Redundancy for Protein Inference and Phosphosite Identification. J Proteome Res 2016;16:494-503. [DOI: 10.1021/acs.jproteome.6b00632] [Citation(s) in RCA: 126] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]

Protein inference: A protein quantification perspective. Comput Biol Chem 2016;63:21-29. [DOI: 10.1016/j.compbiolchem.2016.02.006] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2016] [Accepted: 02/01/2016] [Indexed: 01/04/2023]

Zhao L, Chen Y, Bajaj AO, Eblimit A, Xu M, Soens ZT, Wang F, Ge Z, Jung SY, He F, Li Y, Wensel TG, Qin J, Chen R. Integrative subcellular proteomic analysis allows accurate prediction of human disease-causing genes. Genome Res 2016;26:660-9. [PMID: 26912414 PMCID: PMC4864458 DOI: 10.1101/gr.198911.115] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2015] [Accepted: 02/19/2016] [Indexed: 12/04/2022]

Affiliation(s)

Li Zhao Structural and Computational Biology and Molecular Biophysics Graduate Program, Baylor College of Medicine, Houston, Texas 77030, USA; Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
Yiyun Chen Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
Amol Onkar Bajaj Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas 77030, USA
Aiden Eblimit Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
Mingchu Xu Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
Zachry T Soens Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
Feng Wang Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
Zhongqi Ge Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
Sung Yun Jung Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas 77030, USA
Feng He Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas 77030, USA
Yumei Li Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
Theodore G Wensel Structural and Computational Biology and Molecular Biophysics Graduate Program, Baylor College of Medicine, Houston, Texas 77030, USA; Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas 77030, USA
Jun Qin Structural and Computational Biology and Molecular Biophysics Graduate Program, Baylor College of Medicine, Houston, Texas 77030, USA; Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas 77030, USA
Rui Chen Structural and Computational Biology and Molecular Biophysics Graduate Program, Baylor College of Medicine, Houston, Texas 77030, USA; Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA

Collapse

Moscovitz JE, Nahar MS, Shalat SL, Slitt AL, Dolinoy DC, Aleksunes LM. Correlation between Conjugated Bisphenol A Concentrations and Efflux Transporter Expression in Human Fetal Livers. ACTA ACUST UNITED AC 2016;44:1061-5. [PMID: 26851240 DOI: 10.1124/dmd.115.068668] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2015] [Accepted: 02/04/2016] [Indexed: 12/14/2022]

Affiliation(s)

Jamie E Moscovitz Department of Pharmacology and Toxicology, Ernest Mario School of Pharmacy, Rutgers University, Piscataway, New Jersey (J.E.M., L.M.A.); Department of Environmental Health Sciences, University of Michigan, Ann Arbor, Michigan (M.S.N., D.C.D.); Division of Environmental Health, School of Public Health, Georgia State University, Atlanta, Georgia (S.L.S.); Robert Wood Johnson Medical School, Rutgers University, Piscataway, New Jersey (S.L.S.); Environmental and Occupational Health Sciences Institute, Piscataway, New Jersey (S.L.S., L.M.A.); Department of Biomedical and Pharmaceutical Sciences, University of Rhode Island, Kingston, Rhode Island (A.L.S.); and Department of Nutritional Sciences, University of Michigan, Ann Arbor, Michigan (D.C.D.)
Muna S Nahar Department of Pharmacology and Toxicology, Ernest Mario School of Pharmacy, Rutgers University, Piscataway, New Jersey (J.E.M., L.M.A.); Department of Environmental Health Sciences, University of Michigan, Ann Arbor, Michigan (M.S.N., D.C.D.); Division of Environmental Health, School of Public Health, Georgia State University, Atlanta, Georgia (S.L.S.); Robert Wood Johnson Medical School, Rutgers University, Piscataway, New Jersey (S.L.S.); Environmental and Occupational Health Sciences Institute, Piscataway, New Jersey (S.L.S., L.M.A.); Department of Biomedical and Pharmaceutical Sciences, University of Rhode Island, Kingston, Rhode Island (A.L.S.); and Department of Nutritional Sciences, University of Michigan, Ann Arbor, Michigan (D.C.D.)
Stuart L Shalat Department of Pharmacology and Toxicology, Ernest Mario School of Pharmacy, Rutgers University, Piscataway, New Jersey (J.E.M., L.M.A.); Department of Environmental Health Sciences, University of Michigan, Ann Arbor, Michigan (M.S.N., D.C.D.); Division of Environmental Health, School of Public Health, Georgia State University, Atlanta, Georgia (S.L.S.); Robert Wood Johnson Medical School, Rutgers University, Piscataway, New Jersey (S.L.S.); Environmental and Occupational Health Sciences Institute, Piscataway, New Jersey (S.L.S., L.M.A.); Department of Biomedical and Pharmaceutical Sciences, University of Rhode Island, Kingston, Rhode Island (A.L.S.); and Department of Nutritional Sciences, University of Michigan, Ann Arbor, Michigan (D.C.D.)
Angela L Slitt Department of Pharmacology and Toxicology, Ernest Mario School of Pharmacy, Rutgers University, Piscataway, New Jersey (J.E.M., L.M.A.); Department of Environmental Health Sciences, University of Michigan, Ann Arbor, Michigan (M.S.N., D.C.D.); Division of Environmental Health, School of Public Health, Georgia State University, Atlanta, Georgia (S.L.S.); Robert Wood Johnson Medical School, Rutgers University, Piscataway, New Jersey (S.L.S.); Environmental and Occupational Health Sciences Institute, Piscataway, New Jersey (S.L.S., L.M.A.); Department of Biomedical and Pharmaceutical Sciences, University of Rhode Island, Kingston, Rhode Island (A.L.S.); and Department of Nutritional Sciences, University of Michigan, Ann Arbor, Michigan (D.C.D.)
Dana C Dolinoy Department of Pharmacology and Toxicology, Ernest Mario School of Pharmacy, Rutgers University, Piscataway, New Jersey (J.E.M., L.M.A.); Department of Environmental Health Sciences, University of Michigan, Ann Arbor, Michigan (M.S.N., D.C.D.); Division of Environmental Health, School of Public Health, Georgia State University, Atlanta, Georgia (S.L.S.); Robert Wood Johnson Medical School, Rutgers University, Piscataway, New Jersey (S.L.S.); Environmental and Occupational Health Sciences Institute, Piscataway, New Jersey (S.L.S., L.M.A.); Department of Biomedical and Pharmaceutical Sciences, University of Rhode Island, Kingston, Rhode Island (A.L.S.); and Department of Nutritional Sciences, University of Michigan, Ann Arbor, Michigan (D.C.D.)
Lauren M Aleksunes Department of Pharmacology and Toxicology, Ernest Mario School of Pharmacy, Rutgers University, Piscataway, New Jersey (J.E.M., L.M.A.); Department of Environmental Health Sciences, University of Michigan, Ann Arbor, Michigan (M.S.N., D.C.D.); Division of Environmental Health, School of Public Health, Georgia State University, Atlanta, Georgia (S.L.S.); Robert Wood Johnson Medical School, Rutgers University, Piscataway, New Jersey (S.L.S.); Environmental and Occupational Health Sciences Institute, Piscataway, New Jersey (S.L.S., L.M.A.); Department of Biomedical and Pharmaceutical Sciences, University of Rhode Island, Kingston, Rhode Island (A.L.S.); and Department of Nutritional Sciences, University of Michigan, Ann Arbor, Michigan (D.C.D.)

Collapse

He Z, Huang T, Zhao C, Teng B. Protein Inference. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2016;919:237-242. [PMID: 27975221 DOI: 10.1007/978-3-319-41448-5_12] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]

Zhao C, Liu D, Teng B, He Z. BagReg: Protein inference through machine learning. Comput Biol Chem 2015;57:12-20. [DOI: 10.1016/j.compbiolchem.2015.02.009] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2014] [Accepted: 02/03/2015] [Indexed: 10/24/2022]

Bauernfeind AL, Reyzer ML, Caprioli RM, Ely JJ, Babbitt CC, Wray GA, Hof PR, Sherwood CC. High spatial resolution proteomic comparison of the brain in humans and chimpanzees. J Comp Neurol 2015;523:2043-61. [PMID: 25779868 DOI: 10.1002/cne.23777] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2014] [Revised: 02/03/2015] [Accepted: 03/11/2015] [Indexed: 12/30/2022]

Bauernfeind AL, Soderblom EJ, Turner ME, Moseley MA, Ely JJ, Hof PR, Sherwood CC, Wray GA, Babbitt CC. Evolutionary Divergence of Gene and Protein Expression in the Brains of Humans and Chimpanzees. Genome Biol Evol 2015;7:2276-88. [PMID: 26163674 PMCID: PMC4558850 DOI: 10.1093/gbe/evv132] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open

Abstract

Although transcriptomic profiling has become the standard approach for exploring molecular differences in the primate brain, very little is known about how the expression levels of gene transcripts relate to downstream protein abundance. Moreover, it is unknown whether the relationship changes depending on the brain region or species under investigation. We performed high-throughput transcriptomic (RNA-Seq) and proteomic (liquid chromatography coupled with tandem mass spectrometry) analyses on two regions of the human and chimpanzee brain: The anterior cingulate cortex and caudate nucleus. In both brain regions, we found a lower correlation between mRNA and protein expression levels in humans and chimpanzees than has been reported for other tissues and cell types, suggesting that the brain may engage extensive tissue-specific regulation affecting protein abundance. In both species, only a few categories of biological function exhibited strong correlations between mRNA and protein expression levels. These categories included oxidative metabolism and protein synthesis and modification, indicating that the expression levels of mRNA transcripts supporting these biological functions are more predictive of protein expression compared with other functional categories. More generally, however, the two measures of molecular expression provided strikingly divergent perspectives into differential expression between human and chimpanzee brains: mRNA comparisons revealed significant differences in neuronal communication, ion transport, and regulatory processes, whereas protein comparisons indicated differences in perception and cognition, metabolic processes, and organization of the cytoskeleton. Our results highlight the importance of examining protein expression in evolutionary analyses and call for a more thorough understanding of tissue-specific protein expression levels.

Collapse

Sikdar S, Gill R, Datta S. Improving protein identification from tandem mass spectrometry data by one-step methods and integrating data from other platforms. Brief Bioinform 2015;17:262-9. [PMID: 26141827 DOI: 10.1093/bib/bbv043] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2015] [Indexed: 01/28/2023] Open

Uszkoreit J, Maerkens A, Perez-Riverol Y, Meyer HE, Marcus K, Stephan C, Kohlbacher O, Eisenacher M. PIA: An Intuitive Protein Inference Engine with a Web-Based User Interface. J Proteome Res 2015;14:2988-97. [DOI: 10.1021/acs.jproteome.5b00121] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]

Väremo L, Scheele C, Broholm C, Mardinoglu A, Kampf C, Asplund A, Nookaew I, Uhlén M, Pedersen BK, Nielsen J. Proteome- and transcriptome-driven reconstruction of the human myocyte metabolic network and its use for identification of markers for diabetes. Cell Rep 2015;11:921-933. [PMID: 25937284 DOI: 10.1016/j.celrep.2015.04.010] [Citation(s) in RCA: 87] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2014] [Revised: 02/06/2015] [Accepted: 04/03/2015] [Indexed: 11/16/2022] Open

Shanmugam AK, Yocum AK, Nesvizhskii AI. Utility of RNA-seq and GPMDB protein observation frequency for improving the sensitivity of protein identification by tandem MS. J Proteome Res 2014;13:4113-9. [PMID: 25026199 PMCID: PMC4156250 DOI: 10.1021/pr500496p] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]

Recent updates on drug abuse analyzed by neuroproteomics studies: Cocaine, Methamphetamine and MDMA. TRANSLATIONAL PROTEOMICS 2014. [DOI: 10.1016/j.trprot.2014.04.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open

Parts L, Liu YC, Tekkedil MM, Steinmetz LM, Caudy AA, Fraser AG, Boone C, Andrews BJ, Rosebrock AP. Heritability and genetic basis of protein level variation in an outbred population. Genome Res 2014;24:1363-70. [PMID: 24823668 PMCID: PMC4120089 DOI: 10.1101/gr.170506.113] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]

Wang X, Zhang B. Integrating genomic, transcriptomic, and interactome data to improve Peptide and protein identification in shotgun proteomics. J Proteome Res 2014;13:2715-23. [PMID: 24792918 PMCID: PMC4059263 DOI: 10.1021/pr500194t] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]

Goh WWB, Wong L. Computational proteomics: designing a comprehensive analytical strategy. Drug Discov Today 2014;19:266-74. [DOI: 10.1016/j.drudis.2013.07.008] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2013] [Revised: 06/28/2013] [Accepted: 07/11/2013] [Indexed: 02/02/2023]

Teng B, Huang T, He Z. Decoy-free protein-level false discovery rate estimation. ACTA ACUST UNITED AC 2013;30:675-81. [PMID: 23926225 DOI: 10.1093/bioinformatics/btt431] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Xiao CL, Chen XZ, Du YL, Li ZF, Wei L, Zhang G, He QY. Dispec: a novel peptide scoring algorithm based on peptide matching discriminability. PLoS One 2013;8:e62724. [PMID: 23675420 PMCID: PMC3652849 DOI: 10.1371/journal.pone.0062724] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2012] [Accepted: 03/25/2013] [Indexed: 11/20/2022] Open

Chung C, Emili A, Frey BJ. Non-parametric Bayesian approach to post-translational modification refinement of predictions from tandem mass spectrometry. Bioinformatics 2013;29:821-9. [DOI: 10.1093/bioinformatics/btt056] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Huang T, Gong H, Yang C, He Z. ProteinLasso: A Lasso regression approach to protein inference problem in shotgun proteomics. Comput Biol Chem 2013;43:46-54. [PMID: 23385215 DOI: 10.1016/j.compbiolchem.2012.12.008] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2012] [Revised: 12/30/2012] [Accepted: 12/30/2012] [Indexed: 11/28/2022]

Xiao CL, Chen XZ, Du YL, Sun X, Zhang G, He QY. Binomial Probability Distribution Model-Based Protein Identification Algorithm for Tandem Mass Spectrometry Utilizing Peak Intensity Information. J Proteome Res 2012;12:328-35. [DOI: 10.1021/pr300781t] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Li YF, Radivojac P. Computational approaches to protein inference in shotgun proteomics. BMC Bioinformatics 2012;13 Suppl 16:S4. [PMID: 23176300 PMCID: PMC3489551 DOI: 10.1186/1471-2105-13-s16-s4] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Huang T, He Z. A linear programming model for protein inference problem in shotgun proteomics. ACTA ACUST UNITED AC 2012;28:2956-62. [PMID: 22954624 DOI: 10.1093/bioinformatics/bts540] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]

Vogel C, Marcotte EM. Insights into the regulation of protein abundance from proteomic and transcriptomic analyses. Nat Rev Genet 2012;13:227-32. [PMID: 22411467 DOI: 10.1038/nrg3185] [Citation(s) in RCA: 2663] [Impact Index Per Article: 221.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]

Huang T, Wang J, Yu W, He Z. Protein inference: a review. Brief Bioinform 2012;13:586-614. [DOI: 10.1093/bib/bbs004] [Citation(s) in RCA: 79] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open

Choi H, Pavelka N. When one and one gives more than two: challenges and opportunities of integrative omics. Front Genet 2012;2:105. [PMID: 22303399 PMCID: PMC3262227 DOI: 10.3389/fgene.2011.00105] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2011] [Accepted: 12/21/2011] [Indexed: 12/24/2022] Open

Wang X, Slebos RJC, Wang D, Halvey PJ, Tabb DL, Liebler DC, Zhang B. Protein identification using customized protein sequence databases derived from RNA-Seq data. J Proteome Res 2011;11:1009-17. [PMID: 22103967 DOI: 10.1021/pr200766z] [Citation(s) in RCA: 132] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Nan X, Fu G, Zhao Z, Liu S, Patel RY, Liu H, Daga PR, Doerksen RJ, Dang X, Chen Y, Wilkins D. Leveraging domain information to restructure biological prediction. BMC Bioinformatics 2011;12 Suppl 10:S22. [PMID: 22166097 PMCID: PMC3236845 DOI: 10.1186/1471-2105-12-s10-s22] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Abstract

Background

It is commonly believed that including domain knowledge in a prediction model is desirable. However, representing and incorporating domain information in the learning process is, in general, a challenging problem. In this research, we consider domain information encoded by discrete or categorical attributes. A discrete or categorical attribute provides a natural partition of the problem domain, and hence divides the original problem into several non-overlapping sub-problems. In this sense, the domain information is useful if the partition simplifies the learning task. The goal of this research is to develop an algorithm to identify discrete or categorical attributes that maximally simplify the learning task.

Results

We consider restructuring a supervised learning problem via a partition of the problem space using a discrete or categorical attribute. A naive approach exhaustively searches all the possible restructured problems. It is computationally prohibitive when the number of discrete or categorical attributes is large. We propose a metric to rank attributes according to their potential to reduce the uncertainty of a classification task. It is quantified as a conditional entropy achieved using a set of optimal classifiers, each of which is built for a sub-problem defined by the attribute under consideration. To avoid high computational cost, we approximate the solution by the expected minimum conditional entropy with respect to random projections. This approach is tested on three artificial data sets, three cheminformatics data sets, and two leukemia gene expression data sets. Empirical results demonstrate that our method is capable of selecting a proper discrete or categorical attribute to simplify the problem, i.e., the performance of the classifier built for the restructured problem always beats that of the original problem.

Conclusions

The proposed conditional entropy based metric is effective in identifying good partitions of a classification problem, hence enhancing the prediction performance.

Collapse

Kwon T, Choi H, Vogel C, Nesvizhskii AI, Marcotte EM. MSblender: A probabilistic approach for integrating peptide identifications from multiple database search engines. J Proteome Res 2011;10:2949-58. [PMID: 21488652 DOI: 10.1021/pr2002116] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Vogel C, Abreu RDS, Ko D, Le SY, Shapiro BA, Burns SC, Sandhu D, Boutz DR, Marcotte EM, Penalva LO. Sequence signatures and mRNA concentration can explain two-thirds of protein abundance variation in a human cell line. Mol Syst Biol 2011;6:400. [PMID: 20739923 PMCID: PMC2947365 DOI: 10.1038/msb.2010.59] [Citation(s) in RCA: 458] [Impact Index Per Article: 35.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2010] [Accepted: 06/29/2010] [Indexed: 11/23/2022] Open

Abstract

We provide a large-scale dataset on absolute protein and matching mRNA concentrations from the human medulloblastoma cell line Daoy. The correlation between mRNA and protein concentrations is significant and positive (R_s=0.46, R²=0.29, P-value<2e16), although non-linear.

Out of ∼200 tested sequence features, sequence length, frequency and properties of amino acids, as well as translation initiation-related features are the strongest individual correlates of protein abundance when accounting for variation in mRNA concentration.

When integrating mRNA expression data and all sequence features into a non-parametric regression model (Multivariate Adaptive Regression Splines), we were able to explain up to 67% of the variation in protein concentrations. Half of the contributions were attributed to mRNA concentrations, the other half to sequence features relating to regulation of translation and protein degradation. The sequence features are primarily linked to the coding and 3′ untranslated region. To our knowledge, this is the most comprehensive predictive model of human protein concentrations achieved so far.

mRNA decay, translation regulation and protein degradation are essential parts of eukaryotic gene expression regulation (Hieronymus and Silver, 2004; Mata et al, 2005), which enable the dynamics of cellular systems and their responses to external and internal stimuli without having to rely exclusively on transcription regulation. The importance of these processes is emphasized by the generally low correlation between mRNA and protein concentrations. For many prokaryotic and eukaryotic organisms, <50% of variation in protein abundance variation is explained by variation in mRNA concentrations (de Sousa Abreu et al, 2009).

Given the plethora of regulatory mechanisms involved, most studies have focused so far on individual regulators and specific targets. Particularly in human, we currently lack system-wide, quantitative analyses that evaluate the relative contribution of regulatory elements encoded in the mRNA and protein sequence. Existing studies have been carried out only in bacteria and yeast (Nie et al, 2006; Brockmann et al, 2007; Tuller et al, 2007; Wu et al, 2008). Here, we present the first comprehensive analysis on the impact of translation and protein degradation on protein abundance variation in a human cell line. For this purpose, we experimentally measured absolute protein and mRNA concentrations in the Daoy medulloblastoma cell line, using shotgun proteomics and microarrays, respectively (Figure 1). These data comprise one of the largest such sets available today for human. We focused on sequence features that likely impact protein translation and protein degradation, including length, nucleotide composition, structure of the untranslated regions (UTRs), coding sequence, composition of the translation initiation site, presence of upstream open reading frames putative target sites of miRNAs, codon usage, amino-acid composition and protein degradation signals.

Three types of tests have been conducted: (a) we examined partial Spearman's rank correlation of numerical features (e.g. length) with protein concentration, accounting for variation in mRNA concentrations; (b) for numerical and categorical features (e.g. function), we compared two extreme populations with Welch's t-test and (c) using a Multivariate Adaptive Regression Splines model, we analyzed the combined contributions of mRNA expression and sequence features to protein abundance variation (Figure 1). To account for the non-linearity of many relationships, we use non-parametric approaches throughout the analysis.

We observed a significant positive correlation between mRNA and protein concentrations, larger than many previous measurements (de Sousa Abreu et al, 2009). We also show that the contribution of translation and protein degradation is at least as important as the contribution of mRNA transcription and stability to the abundance variation of the final protein products. Although variation in mRNA expression explains ∼25–30% of the variation in protein abundance, another 30–40% can be accounted for by characteristics of the sequences, which we identified in a comparative assessment of global correlates. Among these characteristics, sequence length, amino-acid frequencies and also nucleotide frequencies in the coding region are of strong influence (Figure 3A). Characteristics of the 3′UTR and of the 5′UTR, that is length, nucleotide composition and secondary structures, describe another part of the variation, leaving 33% expression variation unexplained. The unexplained fraction may be accounted for by mechanisms not considered in this analysis (e.g. regulation by RNA-binding proteins or gene-specific structural motifs), as well as expression and measurement noise.

Our combined model including mRNA concentration and sequence features can explain 67% of the variation of protein abundance in this system—and thus has the highest predictive power for human protein abundance achieved so far (Figure 3B).

Transcription, mRNA decay, translation and protein degradation are essential processes during eukaryotic gene expression, but their relative global contributions to steady-state protein concentrations in multi-cellular eukaryotes are largely unknown. Using measurements of absolute protein and mRNA abundances in cellular lysate from the human Daoy medulloblastoma cell line, we quantitatively evaluate the impact of mRNA concentration and sequence features implicated in translation and protein degradation on protein expression. Sequence features related to translation and protein degradation have an impact similar to that of mRNA abundance, and their combined contribution explains two-thirds of protein abundance variation. mRNA sequence lengths, amino-acid properties, upstream open reading frames and secondary structures in the 5′ untranslated region (UTR) were the strongest individual correlates of protein concentrations. In a combined model, characteristics of the coding region and the 3′UTR explained a larger proportion of protein abundance variation than characteristics of the 5′UTR. The absolute protein and mRNA concentration measurements for >1000 human genes described here represent one of the largest datasets currently available, and reveal both general trends and specific examples of post-transcriptional regulation.

Collapse

Veremieva M, Khoruzhenko A, Zaicev S, Negrutskii B, El'skaya A. Unbalanced expression of the translation complex eEF1 subunits in human cardioesophageal carcinoma. Eur J Clin Invest 2011;41:269-76. [PMID: 20964681 DOI: 10.1111/j.1365-2362.2010.02404.x] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]

Torres-García W, Brown SD, Johnson RH, Zhang W, Runger GC, Meldrum DR. Integrative analysis of transcriptomic and proteomic data of Shewanella oneidensis: missing value imputation using temporal datasets. MOLECULAR BIOSYSTEMS 2011;7:1093-104. [PMID: 21212895 DOI: 10.1039/c0mb00260g] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]

Abstract

Despite significant improvements in recent years, proteomic datasets currently available still suffer from large number of missing values. Integrative analyses based upon incomplete proteomic and transcriptomic datasets could seriously bias the biological interpretation. In this study, we applied a non-linear data-driven stochastic gradient boosted trees (GBT) model to impute missing proteomic values using a temporal transcriptomic and proteomic dataset of Shewanella oneidensis. In this dataset, genes' expression was measured after the cells were exposed to 1 mM potassium chromate for 5, 30, 60, and 90 min, while protein abundance was measured for 45 and 90 min. With the ultimate objective to impute protein values for experimentally undetected samples at 45 and 90 min, we applied a serial set of algorithms to capture relationships between temporal gene and protein expression. This work follows four main steps: (1) a quality control step for gene expression reliability, (2) mRNA imputation, (3) protein prediction, and (4) validation. Initially, an S control chart approach is performed on gene expression replicates to remove unwanted variability. Then, we focused on the missing measurements of gene expression through a nonlinear Smoothing Splines Curve Fitting. This method identifies temporal relationships among transcriptomic data at different time points and enables imputation of mRNA abundance at 45 min. After mRNA imputation was validated by biological constrains (i.e. operons), we used a data-driven GBT model to impute protein abundance for the proteins experimentally undetected in the 45 and 90 min samples, based on relevant predictors such as temporal mRNA gene expression data and cellular functional roles. The imputed protein values were validated using biological constraints such as operon and pathway information through a permutation test to investigate whether dispersion measures are indeed smaller for known biological groups than for any set of random genes. Finally, we demonstrated that such missing value imputation improved characterization of the temporal response of S. oneidensis to chromate.

Collapse

Schrimpf SP, Hengartner MO. A worm rich in protein: Quantitative, differential, and global proteomics in Caenorhabditis elegans. J Proteomics 2010;73:2186-97. [DOI: 10.1016/j.jprot.2010.03.014] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2010] [Accepted: 03/29/2010] [Indexed: 12/26/2022]

Nesvizhskii AI. A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. J Proteomics 2010;73:2092-123. [PMID: 20816881 DOI: 10.1016/j.jprot.2010.08.009] [Citation(s) in RCA: 358] [Impact Index Per Article: 25.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2010] [Revised: 08/25/2010] [Accepted: 08/25/2010] [Indexed: 12/18/2022]

Protein and gene model inference based on statistical modeling in k-partite graphs. Proc Natl Acad Sci U S A 2010;107:12101-6. [PMID: 20562346 DOI: 10.1073/pnas.0907654107] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Fox JM, Erill I. Relative codon adaptation: a generic codon bias index for prediction of gene expression. DNA Res 2010;17:185-96. [PMID: 20453079 PMCID: PMC2885275 DOI: 10.1093/dnares/dsq012] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Maier T, Güell M, Serrano L. Correlation of mRNA and protein in complex biological samples. FEBS Lett 2010;583:3966-73. [PMID: 19850042 DOI: 10.1016/j.febslet.2009.10.036] [Citation(s) in RCA: 1235] [Impact Index Per Article: 88.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2009] [Revised: 10/09/2009] [Accepted: 10/14/2009] [Indexed: 01/12/2023]

Ochs MF. Knowledge-based data analysis comes of age. Brief Bioinform 2010;11:30-9. [PMID: 19854753 PMCID: PMC3700349 DOI: 10.1093/bib/bbp044] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2009] [Revised: 09/03/2009] [Indexed: 12/16/2022] Open