1
|
Kong W, Wong BJH, Hui HWH, Lim KP, Wang Y, Wong L, Goh WWB. ProJect: a powerful mixed-model missing value imputation method. Brief Bioinform 2023:bbad233. [PMID: 37419612 DOI: 10.1093/bib/bbad233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Revised: 05/24/2023] [Accepted: 06/05/2023] [Indexed: 07/09/2023] Open
Abstract
Missing values (MVs) can adversely impact data analysis and machine-learning model development. We propose a novel mixed-model method for missing value imputation (MVI). This method, ProJect (short for Protein inJection), is a powerful and meaningful improvement over existing MVI methods such as Bayesian principal component analysis (PCA), probabilistic PCA, local least squares and quantile regression imputation of left-censored data. We rigorously tested ProJect on various high-throughput data types, including genomics and mass spectrometry (MS)-based proteomics. Specifically, we utilized renal cancer (RC) data acquired using DIA-SWATH, ovarian cancer (OC) data acquired using DIA-MS, bladder (BladderBatch) and glioblastoma (GBM) microarray gene expression dataset. Our results demonstrate that ProJect consistently performs better than other referenced MVI methods. It achieves the lowest normalized root mean square error (on average, scoring 45.92% less error in RC_C, 27.37% in RC_full, 29.22% in OC, 23.65% in BladderBatch and 20.20% in GBM relative to the closest competing method) and the Procrustes sum of squared error (Procrustes SS) (exhibits 79.71% less error in RC_C, 38.36% in RC full, 18.13% in OC, 74.74% in BladderBatch and 30.79% in GBM compared to the next best method). ProJect also leads with the highest correlation coefficient among all types of MV combinations (0.64% higher in RC_C, 0.24% in RC full, 0.55% in OC, 0.39% in BladderBatch and 0.27% in GBM versus the second-best performing method). ProJect's key strength is its ability to handle different types of MVs commonly found in real-world data. Unlike most MVI methods that are designed to handle only one type of MV, ProJect employs a decision-making algorithm that first determines if an MV is missing at random or missing not at random. It then employs targeted imputation strategies for each MV type, resulting in more accurate and reliable imputation outcomes. An R implementation of ProJect is available at https://github.com/miaomiao6606/ProJect.
Collapse
Affiliation(s)
- Weijia Kong
- School of Biological Sciences, Nanyang Technological University, Singapore
- Department of Computer Science, National University of Singapore, Singapore
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore
| | | | | | - Kai Peng Lim
- School of Biological Sciences, Nanyang Technological University, Singapore
| | - Yulan Wang
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore
| | - Limsoon Wong
- Department of Computer Science, National University of Singapore, Singapore
| | - Wilson Wen Bin Goh
- School of Biological Sciences, Nanyang Technological University, Singapore
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore
- Center for Biomedical Informatics, Nanyang Technological University, Singapore
| |
Collapse
|
2
|
Fan S, Weixuan W, Han H, Liansheng Z, Gang L, Jierui W, Yanshu Z. Role of NF-κB in lead exposure-induced activation of astrocytes based on bioinformatics analysis of hippocampal proteomics. Chem Biol Interact 2023; 370:110310. [PMID: 36539177 DOI: 10.1016/j.cbi.2022.110310] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Revised: 12/05/2022] [Accepted: 12/14/2022] [Indexed: 12/23/2022]
Abstract
Lead (Pb), as a heavy metal, is used in batteries, ceramics, paint, pipes, certain ceramics, e-waste recycling, etc. Chronic Pb exposure can result in the inflammation of the central nervous system, as well as neurobehavioral changes. Both glial cells and neurons are involved in central nervous injury following Pb exposure. However, significant cellular events and their key regulators following Pb exposure remain to be elucidated. In this study, rats were randomly exposed to 250 or 500 mg/L PbAc for 9 weeks. Hippocampal proteomics was performed using isobaric tags for relative absolute quantification. Bioinformatics analysis was used to identify 301 and 267 differentially expressed proteins-which were involved in biological processes, including glial cell activation, neural nucleus development, and mRNA processing-in the low and high Pb exposure groups, respectively. Gene Set Enrichment Analysis showed that astrocyte activation was identified as a significant cellular event occurring in the low- or high-dose Pb exposure group. Subsequently, in vivo and in vitro models of Pb exposure were established to confirm astrocyte activation. As a result, glial fibrillary acidic protein expression in astrocytes was much higher in the Pb exposure group. Moreover, the mRNA expression of neurotoxic reactive astrocyte genes was much higher than that of the control group. The analysis of transcription factors indicated that NF-κB was screened as the top transcription factor, which might regulate astrocyte activation following Pb exposure in the rat hippocampus. The data also showed that the inhibition of NF-κB transcription suppressed astrocyte activation following Pb exposure. Overall, astrocyte activation was one of the significant cellular events following Pb exposure in the rat hippocampus, which was regulated by the NF-κB transcription factor, suggesting that inhibiting astrocyte activation may be a potential target for the prevention of Pb neurotoxicity.
Collapse
Affiliation(s)
- Shi Fan
- School of Public Health, North China University of Science of Technology, Tangshan, 062310, Hebei, China.
| | - Wang Weixuan
- School of Public Health, North China University of Science of Technology, Tangshan, 062310, Hebei, China.
| | - Hao Han
- School of Public Health, North China University of Science of Technology, Tangshan, 062310, Hebei, China.
| | - Zhang Liansheng
- School of Public Health, North China University of Science of Technology, Tangshan, 062310, Hebei, China.
| | - Liu Gang
- Department of Medicine, North China University of Science of Technology, Tangshan, 062310, Hebei, China.
| | - Wang Jierui
- School of Public Health, North China University of Science of Technology, Tangshan, 062310, Hebei, China.
| | - Zhang Yanshu
- School of Public Health, North China University of Science of Technology, Tangshan, 062310, Hebei, China; Laboratory Animal Center, North China University of Science and Technology, Tangshan Hebei, 063210, People's Republic of China.
| |
Collapse
|
3
|
Processes in DNA damage response from a whole-cell multi-omics perspective. iScience 2022; 25:105341. [PMID: 36339253 PMCID: PMC9633746 DOI: 10.1016/j.isci.2022.105341] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Revised: 08/10/2022] [Accepted: 10/10/2022] [Indexed: 11/09/2022] Open
Abstract
Technological advances have made it feasible to collect multi-condition multi-omic time courses of cellular response to perturbation, but the complexity of these datasets impedes discovery due to challenges in data management, analysis, visualization, and interpretation. Here, we report a whole-cell mechanistic analysis of HL-60 cellular response to bendamustine. We integrate both enrichment and network analysis to show the progression of DNA damage and programmed cell death over time in molecular, pathway, and process-level detail using an interactive analysis framework for multi-omics data. Our framework, Mechanism of Action Generator Involving Network analysis (MAGINE), automates network construction and enrichment analysis across multiple samples and platforms, which can be integrated into our annotated gene-set network to combine the strengths of networks and ontology-driven analysis. Taken together, our work demonstrates how multi-omics integration can be used to explore signaling processes at various resolutions and demonstrates multi-pathway involvement beyond the canonical bendamustine mechanism.
Collapse
|
4
|
Yan K, Mei Z, Zhao J, Prodhan MAI, Obal D, Katragadda K, Doelling B, Hoetker D, Posa DK, He L, Yin X, Shah J, Pan J, Rai S, Lorkiewicz PK, Zhang X, Liu S, Bhatnagar A, Baba SP. Integrated Multilayer Omics Reveals the Genomic, Proteomic, and Metabolic Influences of Histidyl Dipeptides on the Heart. J Am Heart Assoc 2022; 11:e023868. [PMID: 35730646 PMCID: PMC9333374 DOI: 10.1161/jaha.121.023868] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
Background Histidyl dipeptides such as carnosine are present in a micromolar to millimolar range in mammalian hearts. These dipeptides facilitate glycolysis by proton buffering. They form conjugates with reactive aldehydes, such as acrolein, and attenuate myocardial ischemia–reperfusion injury. Although these dipeptides exhibit multifunctional properties, a composite understanding of their role in the myocardium is lacking. Methods and Results To identify histidyl dipeptide–mediated responses in the heart, we used an integrated triomics approach, which involved genome‐wide RNA sequencing, global proteomics, and unbiased metabolomics to identify the effects of cardiospecific transgenic overexpression of the carnosine synthesizing enzyme, carnosine synthase (Carns), in mice. Our result showed that higher myocardial levels of histidyl dipeptides were associated with extensive changes in the levels of several microRNAs, which target the expression of contractile proteins, β‐fatty acid oxidation, and citric acid cycle (TCA) enzymes. Global proteomic analysis showed enrichment in the expression of contractile proteins, enzymes of β‐fatty acid oxidation, and the TCA in the Carns transgenic heart. Under aerobic conditions, the Carns transgenic hearts had lower levels of short‐ and long‐chain fatty acids as well as the TCA intermediate—succinic acid; whereas, under ischemic conditions, the accumulation of fatty acids and TCA intermediates was significantly attenuated. Integration of multiple data sets suggested that β‐fatty acid oxidation and TCA pathways exhibit correlative changes in the Carns transgenic hearts at all 3 levels. Conclusions Taken together, these findings reveal a central role of histidyl dipeptides in coordinated regulation of myocardial structure, function, and energetics.
Collapse
Affiliation(s)
- Keqiang Yan
- Beijing Institute of Genomics Chinese Academy of Sciences, Beishan Industrial Zone Shenzhen China
| | - Zhanlong Mei
- Beijing Institute of Genomics Chinese Academy of Sciences, Beishan Industrial Zone Shenzhen China
| | - Jingjing Zhao
- Diabetes and Obesity Center University of Louisville KY.,Christina Lee Brown Envirome Institute University of Louisville KY USA
| | | | - Detlef Obal
- Department of Anesthesiology and Perioperative and Pain Medicine Stanford University Palo Alto CA
| | - Kartik Katragadda
- Diabetes and Obesity Center University of Louisville KY.,Christina Lee Brown Envirome Institute University of Louisville KY USA
| | - Benjamin Doelling
- Diabetes and Obesity Center University of Louisville KY.,Christina Lee Brown Envirome Institute University of Louisville KY USA
| | - David Hoetker
- Diabetes and Obesity Center University of Louisville KY.,Christina Lee Brown Envirome Institute University of Louisville KY USA
| | - Dheeraj Kumar Posa
- Diabetes and Obesity Center University of Louisville KY.,Christina Lee Brown Envirome Institute University of Louisville KY USA
| | - Liqing He
- Department of Chemistry University of Louisville KY
| | - Xinmin Yin
- Department of Chemistry University of Louisville KY
| | - Jasmit Shah
- Department of Medicine, Medical college The Aga Khan University Nairobi Kenya
| | - Jianmin Pan
- Biostatistics Shared Facility University of Louisville Health, Brown Cancer Center Louisville KY
| | - Shesh Rai
- Biostatistics Shared Facility University of Louisville Health, Brown Cancer Center Louisville KY
| | - Pawel Konrad Lorkiewicz
- Diabetes and Obesity Center University of Louisville KY.,Christina Lee Brown Envirome Institute University of Louisville KY USA
| | - Xiang Zhang
- Department of Chemistry University of Louisville KY
| | - Siqi Liu
- Beijing Institute of Genomics Chinese Academy of Sciences, Beishan Industrial Zone Shenzhen China
| | - Aruni Bhatnagar
- Diabetes and Obesity Center University of Louisville KY.,Christina Lee Brown Envirome Institute University of Louisville KY USA
| | - Shahid P Baba
- Diabetes and Obesity Center University of Louisville KY.,Christina Lee Brown Envirome Institute University of Louisville KY USA
| |
Collapse
|
5
|
Hartl D, de Luca V, Kostikova A, Laramie J, Kennedy S, Ferrero E, Siegel R, Fink M, Ahmed S, Millholland J, Schuhmacher A, Hinder M, Piali L, Roth A. Translational precision medicine: an industry perspective. J Transl Med 2021; 19:245. [PMID: 34090480 PMCID: PMC8179706 DOI: 10.1186/s12967-021-02910-6] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Accepted: 05/25/2021] [Indexed: 02/08/2023] Open
Abstract
In the era of precision medicine, digital technologies and artificial intelligence, drug discovery and development face unprecedented opportunities for product and business model innovation, fundamentally changing the traditional approach of how drugs are discovered, developed and marketed. Critical to this transformation is the adoption of new technologies in the drug development process, catalyzing the transition from serendipity-driven to data-driven medicine. This paradigm shift comes with a need for both translation and precision, leading to a modern Translational Precision Medicine approach to drug discovery and development. Key components of Translational Precision Medicine are multi-omics profiling, digital biomarkers, model-based data integration, artificial intelligence, biomarker-guided trial designs and patient-centric companion diagnostics. In this review, we summarize and critically discuss the potential and challenges of Translational Precision Medicine from a cross-industry perspective.
Collapse
Affiliation(s)
- Dominik Hartl
- Novartis Institutes for BioMedical Research, Basel, Switzerland.
- Department of Pediatrics I, University of Tübingen, Tübingen, Germany.
| | - Valeria de Luca
- Novartis Institutes for BioMedical Research, Basel, Switzerland
| | - Anna Kostikova
- Novartis Institutes for BioMedical Research, Basel, Switzerland
| | - Jason Laramie
- Novartis Institutes for BioMedical Research, Cambridge, MA, USA
| | - Scott Kennedy
- Novartis Institutes for BioMedical Research, Cambridge, MA, USA
| | - Enrico Ferrero
- Novartis Institutes for BioMedical Research, Basel, Switzerland
| | - Richard Siegel
- Novartis Institutes for BioMedical Research, Basel, Switzerland
| | - Martin Fink
- Novartis Institutes for BioMedical Research, Basel, Switzerland
| | | | | | | | - Markus Hinder
- Novartis Institutes for BioMedical Research, Basel, Switzerland
| | - Luca Piali
- Roche Innovation Center Basel, Basel, Switzerland
| | - Adrian Roth
- Roche Innovation Center Basel, Basel, Switzerland
| |
Collapse
|
6
|
Chang AYF, Liao BY. Reduced Translational Efficiency of Eukaryotic Genes after Duplication Events. Mol Biol Evol 2021; 37:1452-1461. [PMID: 31904835 DOI: 10.1093/molbev/msz309] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Control of gene expression has been found to be predominantly determined at the level of protein translation. However, to date, reduced expression from duplicated genes in eukaryotes for dosage maintenance has only been linked to transcriptional control involving epigenetic mechanisms. Here, we hypothesize that dosage maintenance following gene duplication also involves regulation at the protein level. To test this hypothesis, we compared transcriptome and proteome data of yeast models, Saccharomyces cerevisiae and Schizosaccharomyces pombe, and worm models, Caenorhabditis elegans and Caenorhabditis briggsae, to investigate lineage-specifically duplicated genes. Duplicated genes in both eukaryotic models exhibited a reduced protein-to-mRNA abundance ratio. Moreover, dosage sensitive genes, represented by genes encoding protein complex subunits, reduced their protein-to-mRNA abundance ratios more significantly than the other genes after duplication events. An analysis of ribosome profiling (Ribo-Seq) data further showed that reduced translational efficiency was more prominent for dosage sensitive genes than for the other genes. Meanwhile, no difference in protein degradation rate was associated with duplication events. Translationally repressed duplicated genes were also more likely to be inhibited at the level of transcription. Taken together, these results suggest that translation-mediated dosage control is partially contributed by natural selection and it enhances transcriptional control in maintaining gene dosage after gene duplication events during eukaryotic genome evolution.
Collapse
Affiliation(s)
- Andrew Ying-Fei Chang
- Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Taiwan, Republic of China
| | - Ben-Yang Liao
- Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Taiwan, Republic of China
| |
Collapse
|
7
|
Magnano CS, Gitter A. Automating parameter selection to avoid implausible biological pathway models. NPJ Syst Biol Appl 2021; 7:12. [PMID: 33623016 PMCID: PMC7902638 DOI: 10.1038/s41540-020-00167-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2020] [Accepted: 12/07/2020] [Indexed: 11/28/2022] Open
Abstract
A common way to integrate and analyze large amounts of biological "omic" data is through pathway reconstruction: using condition-specific omic data to create a subnetwork of a generic background network that represents some process or cellular state. A challenge in pathway reconstruction is that adjusting pathway reconstruction algorithms' parameters produces pathways with drastically different topological properties and biological interpretations. Due to the exploratory nature of pathway reconstruction, there is no ground truth for direct evaluation, so parameter tuning methods typically used in statistics and machine learning are inapplicable. We developed the pathway parameter advising algorithm to tune pathway reconstruction algorithms to minimize biologically implausible predictions. We leverage background knowledge in pathway databases to select pathways whose high-level structure resembles that of manually curated biological pathways. At the core of this method is a graphlet decomposition metric, which measures topological similarity to curated biological pathways. In order to evaluate pathway parameter advising, we compare its performance in avoiding implausible networks and reconstructing pathways from the NetPath database with other parameter selection methods across four pathway reconstruction algorithms. We also demonstrate how pathway parameter advising can guide reconstruction of an influenza host factor network. Pathway parameter advising is method agnostic; it is applicable to any pathway reconstruction algorithm with tunable parameters.
Collapse
Affiliation(s)
- Chris S Magnano
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI, USA
- Morgridge Institute for Research, Madison, WI, USA
| | - Anthony Gitter
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI, USA.
- Morgridge Institute for Research, Madison, WI, USA.
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA.
| |
Collapse
|
8
|
Li H, Wu Y, Liu W, Zhang XM, Gong JS, Shi JS, Xu ZH. iTRAQ-based quantitative proteomic analysis of Colletotrichum lini reveals ethanol induced mechanism for enhancing dihydroxylation efficiency of DHEA. J Proteomics 2020; 224:103851. [PMID: 32485395 DOI: 10.1016/j.jprot.2020.103851] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2020] [Revised: 05/09/2020] [Accepted: 05/27/2020] [Indexed: 10/24/2022]
Abstract
Colletotrichum lini is used as an industrial stain for the dihydroxylation of steroid compound dehydroepiandrosterone (DHEA) to biosynthesize 3β,7α,15α-trihydroxy-5-androstene-17-one (7α,15α-diOH-DHEA), a key intermediate of the most popular oral contraceptive "Yasmin". This work aimed to enhance 7α,15α-diOH-DHEA production in C. lini CGMCC 6051 through ethanol induction. With 0.6% (v/v) ethanol induction and 10 g/L DHEA concentration, the 7α,15α-diOH-DHEA molar yield reached 58.8%, which was increased by 67.5% than that of the control. iTRAQ-based quantitative proteomic analysis was applied to explore the probable molecular mechanism of C. lini response to ethanol induction. A total of 50 differential expressed proteins was affected by ethanol induction, and could be related to multiple metabolic pathways. Most of differently expressed proteins were functionally mapped into pathways of transport, steroids metabolism, or redox reaction. Other proteins for energy, transcription and translation, and carbohydrate metabolism might have important roles in the cellular response to ethanol induction. In addition, the levels of cytochrome P450 and NAD(P)H-cytochrome P450 reductase were remarkably higher under ethanol induction, and their functions on DHEA dihydroxylation were first proposed in C. lini. Our results provide critical clues in revealing the dihydroxylation mechanism and are important for efficient microbiological hydroxylation of steroidal compounds in the future. BIOLOGICAL SIGNIFICANCE: iTRAQ strategy was first used to compare the proteomes of ethanol induction during the dihydroxylation reaction by Colletotrichum lini CGMCC 6051. The changes in protein provided a comprehensive overview of DHEA dihydroxylation in C. lini, including the proteins for steroids metabolism, redox reaction, transport, transcription and translation, energy and carbohydrate metabolism. Cytochrome P450, NADPH-cytochrome P450 reductase, and NADH-cytochrome b5 reductase were highlighted due to their outstanding contribution to DHEA dihydroxylation. The results help us understand the molecular mechanism underlying ethanol induction in C. lini and would guide strain engineering to further improve dihydroxylation efficiency.
Collapse
Affiliation(s)
- Hui Li
- School of Pharmaceutical Sciences, Jiangnan University, Wuxi 214122, China
| | - Yan Wu
- The Key Laboratory of Industrial Biotechnology, Ministry of Education, School of Biotechnology, Jiangnan University, Wuxi 214122, China
| | - Wei Liu
- School of Pharmaceutical Sciences, Jiangnan University, Wuxi 214122, China
| | - Xiao-Mei Zhang
- School of Pharmaceutical Sciences, Jiangnan University, Wuxi 214122, China
| | - Jin-Song Gong
- School of Pharmaceutical Sciences, Jiangnan University, Wuxi 214122, China
| | - Jin-Song Shi
- School of Pharmaceutical Sciences, Jiangnan University, Wuxi 214122, China
| | - Zheng-Hong Xu
- National Engineering Laboratory for Cereal Fermentation Technology, Jiangnan University, Wuxi 214122, China; The Key Laboratory of Industrial Biotechnology, Ministry of Education, School of Biotechnology, Jiangnan University, Wuxi 214122, China; Jiangsu Provincial Research Center for Bioactive Product Processing Technology, Jiangnan University, Wuxi 214122, China.
| |
Collapse
|
9
|
Nia AM, Chen T, Barnette BL, Khanipov K, Ullrich RL, Bhavnani SK, Emmett MR. Efficient identification of multiple pathways: RNA-Seq analysis of livers from 56Fe ion irradiated mice. BMC Bioinformatics 2020; 21:118. [PMID: 32192433 PMCID: PMC7082965 DOI: 10.1186/s12859-020-3446-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2019] [Accepted: 03/06/2020] [Indexed: 12/25/2022] Open
Abstract
Background mRNA interaction with other mRNAs and other signaling molecules determine different biological pathways and functions. Gene co-expression network analysis methods have been widely used to identify correlation patterns between genes in various biological contexts (e.g., cancer, mouse genetics, yeast genetics). A challenge remains to identify an optimal partition of the networks where the individual modules (clusters) are neither too small to make any general inferences, nor too large to be biologically interpretable. Clustering thresholds for identification of modules are not systematically determined and depend on user-settable parameters requiring optimization. The absence of systematic threshold determination may result in suboptimal module identification and a large number of unassigned features. Results In this study, we propose a new pipeline to perform gene co-expression network analysis. The proposed pipeline employs WGCNA, a software widely used to perform different aspects of gene co-expression network analysis, and Modularity Maximization algorithm, to analyze novel RNA-Seq data to understand the effects of low-dose 56Fe ion irradiation on the formation of hepatocellular carcinoma in mice. The network results, along with experimental validation, show that using WGCNA combined with Modularity Maximization, provides a more biologically interpretable network in our dataset, than that obtainable using WGCNA alone. The proposed pipeline showed better performance than the existing clustering algorithm in WGCNA, and identified a module that was biologically validated by a mitochondrial complex I assay. Conclusions We present a pipeline that can reduce the problem of parameter selection that occurs with the existing algorithm in WGCNA, for applicable RNA-Seq datasets. This may assist in the future discovery of novel mRNA interactions, and elucidation of their potential downstream molecular effects.
Collapse
Affiliation(s)
- Anna M Nia
- Biochemistry and Molecular Biology, The University of Texas Medical Branch, Galveston, Texas, USA
| | - Tianlong Chen
- Institute for Translational Sciences, The University of Texas Medical Branch, Galveston, Texas, USA
| | - Brooke L Barnette
- Biochemistry and Molecular Biology, The University of Texas Medical Branch, Galveston, Texas, USA
| | - Kamil Khanipov
- Pharmacology and Toxicology, The University of Texas Medical Branch, Galveston, Texas, USA
| | | | - Suresh K Bhavnani
- Institute for Translational Sciences, The University of Texas Medical Branch, Galveston, Texas, USA
| | - Mark R Emmett
- Biochemistry and Molecular Biology, The University of Texas Medical Branch, Galveston, Texas, USA. .,Pharmacology and Toxicology, The University of Texas Medical Branch, Galveston, Texas, USA. .,Radiation Oncology, The University of Texas Medical Branch, Galveston, Texas, USA.
| |
Collapse
|
10
|
Proteomic investigation of intra-tumor heterogeneity using network-based contextualization - A case study on prostate cancer. J Proteomics 2019; 206:103446. [PMID: 31323421 DOI: 10.1016/j.jprot.2019.103446] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2019] [Revised: 06/12/2019] [Accepted: 07/08/2019] [Indexed: 12/26/2022]
Abstract
Cancer is a heterogeneous disease, confounding the identification of relevant markers and drug targets. Network-based analysis is robust against noise, potentially offering a promising approach towards biomarker identification. We describe here the application of two network-based methods, qPSP (Quantitative Proteomics Signature Profiling) and PFSNet (Paired Fuzzy SubNetworks), in an intra-tissue proteome data set of prostate tissue samples. Despite high basal variation, we find that traditional statistical analysis may exaggerate the extent of heterogeneity. We also report that network-based analysis outperforms protein-based feature selection with concomitantly higher cross-validation accuracy. Overall, network-based analysis provides emergent signal that boosts sensitivity while retaining good precision. It is a potential means of circumventing heterogeneity for stable biomarker discovery.
Collapse
|
11
|
Zhao Y, Sue ACH, Goh WWB. Deeper investigation into the utility of functional class scoring in missing protein prediction from proteomics data. J Bioinform Comput Biol 2019; 17:1950013. [DOI: 10.1142/s0219720019500136] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Functional Class Scoring (FCS) is a network-based approach previously demonstrated to be powerful in missing protein prediction (MPP). We update its performance evaluation using data derived from new proteomics technology (SWATH) and also checked for reproducibility using two independent datasets profiling kidney tissue proteome. We also evaluated the objectivity of the FCS p-value, and followed up on the value of MPP from predicted complexes. Our results suggest that (1) FCS [Formula: see text]-values are non-objective, and are confounded strongly by complex size, (2) best recovery performance do not necessarily lie at standard [Formula: see text]-value cutoffs, (3) while predicted complexes may be used for augmenting MPP, they are inferior to real complexes, and are further confounded by issues relating to network coverage and quality and (4) moderate sized complexes of size 5 to 10 still exhibit considerable instability, we find that FCS works best with big complexes. While FCS is a powerful approach, blind reliance on its non-objective [Formula: see text]-value is ill-advised.
Collapse
Affiliation(s)
- Yaxing Zhao
- School of Pharmaceutical Science and Technology, Tianjin University, No. 92, Weijin Road, 30072 Tianjin, P. R. China
| | - Andrew Chi-Hau Sue
- School of Pharmaceutical Science and Technology, Tianjin University, No. 92, Weijin Road, 30072 Tianjin, P. R. China
| | - Wilson Wen Bin Goh
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, 637551, Singapore
| |
Collapse
|
12
|
Li T, Wu Q, Duan X, Yun Z, Jiang Y. Proteomic and transcriptomic analysis to unravel the influence of high temperature on banana fruit during postharvest storage. Funct Integr Genomics 2019; 19:467-486. [DOI: 10.1007/s10142-019-00662-7] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2018] [Revised: 01/21/2019] [Accepted: 01/31/2019] [Indexed: 11/29/2022]
|
13
|
Zhou L, Wong L, Goh WWB. Understanding missing proteins: a functional perspective. Drug Discov Today 2018; 23:644-651. [DOI: 10.1016/j.drudis.2017.11.011] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2017] [Revised: 10/24/2017] [Accepted: 11/13/2017] [Indexed: 01/03/2023]
|
14
|
Yan F, Mo X, Liu J, Ye S, Zeng X, Chen D. Thymic function in the regulation of T cells, and molecular mechanisms underlying the modulation of cytokines and stress signaling (Review). Mol Med Rep 2017; 16:7175-7184. [PMID: 28944829 PMCID: PMC5865843 DOI: 10.3892/mmr.2017.7525] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2016] [Accepted: 05/12/2017] [Indexed: 01/08/2023] Open
Abstract
The thymus is critical in establishing and maintaining the appropriate microenvironment for promoting the development and selection of T cells. The function and structure of the thymus gland has been extensively studied, particularly as the thymus serves an important physiological role in the lymphatic system. Numerous studies have investigated the morphological features of thymic involution. Recently, research attention has increasingly been focused on thymic proteins as targets for drug intervention. Omics approaches have yielded novel insights into the thymus and possible drug targets. The present review addresses the signaling and transcriptional functions of the thymus, including the molecular mechanisms underlying the regulatory functions of T cells and their role in the immune system. In addition, the levels of cytokines secreted in the thymus have a significant effect on thymic functions, including thymocyte migration and development, thymic atrophy and thymic recovery. Furthermore, the regulation and molecular mechanisms of stress-mediated thymic atrophy and involution were investigated, with particular emphasis on thymic function as a potential target for drug development and discovery using proteomics.
Collapse
Affiliation(s)
- Fenggen Yan
- Department of Dermatology, The Second Affiliated Hospital of Guangzhou University of Chinese Medicine, Guangdong Provincial Hospital of Chinese Medicine, Guangzhou, Guangdong 510120, P.R. China
| | - Xiumei Mo
- Department of Dermatology, The Second Affiliated Hospital of Guangzhou University of Chinese Medicine, Guangdong Provincial Hospital of Chinese Medicine, Guangzhou, Guangdong 510120, P.R. China
| | - Junfeng Liu
- Department of Dermatology, The Second Affiliated Hospital of Guangzhou University of Chinese Medicine, Guangdong Provincial Hospital of Chinese Medicine, Guangzhou, Guangdong 510120, P.R. China
| | - Siqi Ye
- Department of Dermatology, The Second Affiliated Hospital of Guangzhou University of Chinese Medicine, Guangdong Provincial Hospital of Chinese Medicine, Guangzhou, Guangdong 510120, P.R. China
| | - Xing Zeng
- Department of Dermatology, The Second Affiliated Hospital of Guangzhou University of Chinese Medicine, Guangdong Provincial Hospital of Chinese Medicine, Guangzhou, Guangdong 510120, P.R. China
| | - Dacan Chen
- Department of Dermatology, The Second Affiliated Hospital of Guangzhou University of Chinese Medicine, Guangdong Provincial Hospital of Chinese Medicine, Guangzhou, Guangdong 510120, P.R. China
| |
Collapse
|
15
|
Abstract
Protein complex-based feature selection (PCBFS) provides unparalleled reproducibility with high phenotypic relevance on proteomics data. Currently, there are five PCBFS paradigms, but not all representative methods have been implemented or made readily available. To allow general users to take advantage of these methods, we developed the R-package NetProt, which provides implementations of representative feature-selection methods. NetProt also provides methods for generating simulated differential data and generating pseudocomplexes for complex-based performance benchmarking. The NetProt open source R package is available for download from https://github.com/gohwils/NetProt/releases/ , and online documentation is available at http://rpubs.com/gohwils/204259 .
Collapse
Affiliation(s)
- Wilson Wen Bin Goh
- School of Pharmaceutical Science and Technology, Tianjin University , 92 Weijin Road, Tianjin 300072, China.,School of Biological Sciences, Nanyang Technological University , 60 Nanyang Drive, Singapore 637551.,Department of Computer Science, National University of Singapore , 13 Computing Drive, Singapore 117417
| | - Limsoon Wong
- Department of Computer Science, National University of Singapore , 13 Computing Drive, Singapore 117417.,Department of Pathology, National University of Singapore , 5 Lower Kent Ridge Road, Singapore 119074
| |
Collapse
|
16
|
Goh WWB, Wong L. Class-paired Fuzzy SubNETs: A paired variant of the rank-based network analysis family for feature selection based on protein complexes. Proteomics 2017; 17:e1700093. [PMID: 28390171 DOI: 10.1002/pmic.201700093] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2017] [Accepted: 04/05/2017] [Indexed: 01/12/2023]
Abstract
Identifying reproducible yet relevant protein features in proteomics data is a major challenge. Analysis at the level of protein complexes can resolve this issue and we have developed a suite of feature-selection methods collectively referred to as Rank-Based Network Analysis (RBNA). RBNAs differ in their individual statistical test setup but are similar in the sense that they deploy rank-defined weights among proteins per sample. This procedure is known as gene fuzzy scoring. Currently, no RBNA exists for paired-sample scenarios where both control and test tissues originate from the same source (e.g. same patient). It is expected that paired tests, when used appropriately, are more powerful than approaches intended for unpaired samples. We report that the class-paired RBNA, PPFSNET, dominates in both simulated and real data scenarios. Moreover, for the first time, we explicitly incorporate batch-effect resistance as an additional evaluation criterion for feature-selection approaches. Batch effects are class irrelevant variations arising from different handlers or processing times, and can obfuscate analysis. We demonstrate that PPFSNET and an earlier RBNA, PFSNET, are particularly resistant against batch effects, and only select features strongly correlated with class but not batch.
Collapse
Affiliation(s)
- Wilson Wen Bin Goh
- School of Pharmaceutical Science and Technology, Tianjin University, P. R. China.,Department of Computer Science, National University of Singapore, Singapore
| | - Limsoon Wong
- Department of Computer Science, National University of Singapore, Singapore.,Department of Pathology, National University of Singapore, Singapore
| |
Collapse
|
17
|
Proteomics analysis of Fusarium proliferatum under various initial pH during fumonisin production. J Proteomics 2017; 164:59-72. [PMID: 28522339 DOI: 10.1016/j.jprot.2017.05.008] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2017] [Revised: 05/01/2017] [Accepted: 05/08/2017] [Indexed: 11/23/2022]
Abstract
Fusarium proliferatum as a fungal pathogen can produce fumonisin which causes a great threat to animal and human health. Proteomic approach was a useful tool for investigation into mycotoxin biosynthesis in fungal pathogens. In this study, we analyzed the fumonisin content and mycelium proteins of Fusarium proliferatum cultivated under the initial pH5 and 10. Fumonisin production after 10days was significantly induced in culture condition at pH10 than pH5. Ninety nine significantly differently accumulated protein spots under the two pH conditions were detected using two dimensional polyacrylamide gel electrophoresis and 89 of these proteins were successfully identified by MALDI-TOF/TOF and LC-ESI-MS/MS analysis. Among these 89 proteins, 45 were up-regulated at pH10 while 44 were up-accumulated at pH5. At pH10, these proteins were found to involve in the modification of fumonisin backbone including up-regulated polyketide synthase, cytochrome P450, S-adenosylmethionine synthase and O-methyltransferase, which might contribute to the induction of fumonisin production. At pH5, these up-regulated proteins such as l-amino-acid oxidase, isocitrate dehydrogenase and citrate lyase might inhibit the condensation of fumonisin backbone, resulting in reduced production of fumonisins. These results may help us to understand the molecular mechanism of the fumonisin synthesis in F. proliferatum. BIOLOGICAL SIGNIFICANCE To extend our understanding of the mechanism of the fumonisin biosynthesis of F. proliferatum, we reported the fumonisin production in relation to the differential proteins of F. proliferatum mycelium under two pH culture conditions. Among these 89 identified spots, 45 were up-accumulated at pH10 while 44 were up-accumulated at pH5. Our results revealed that increased fumonisin production at pH10 might be related to the induction of fumonisin biosynthesis caused by up-regulation of polyketide synthase, cytochrome P450, S-adenosylmethionine synthase and O-methyltransferase. Meanwhile, the up-regulation of l-amino-acid oxidase, isocitrate dehydrogenase and citrate lyase at pH5 might be related to the inhibition of the condensation of fumonisin backbone, resulting in reduced production of fumonisin. These results may help us to understand better the molecular mechanism of the fumonisin synthesis in F. proliferatum and then broaden the current knowledge of the mechanism of the fumonisin biosynthesis.
Collapse
|
18
|
Goh WWB, Wong L. Protein complex-based analysis is resistant to the obfuscating consequences of batch effects --- a case study in clinical proteomics. BMC Genomics 2017; 18:142. [PMID: 28361693 PMCID: PMC5374662 DOI: 10.1186/s12864-017-3490-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Background In proteomics, batch effects are technical sources of variation that confounds proper analysis, preventing effective deployment in clinical and translational research. Results Using simulated and real data, we demonstrate existing batch effect-correction methods do not always eradicate all batch effects. Worse still, they may alter data integrity, and introduce false positives. Moreover, although Principal component analysis (PCA) is commonly used for detecting batch effects. The principal components (PCs) themselves may be used as differential features, from which relevant differential proteins may be effectively traced. Batch effect are removable by identifying PCs highly correlated with batch but not class effect. However, neither PC-based nor existing batch effect-correction methods address well subtle batch effects, which are difficult to eradicate, and involve data transformation and/or projection which is error-prone. To address this, we introduce the concept of batch-effect resistant methods and demonstrate how such methods incorporating protein complexes are particularly resistant to batch effect without compromising data integrity. Conclusions Protein complex-based analyses are powerful, offering unparalleled differential protein-selection reproducibility and high prediction accuracy. We demonstrate for the first time their innate resistance against batch effects, even subtle ones. As complex-based analyses require no prior data transformation (e.g. batch-effect correction), data integrity is protected. Individual checks on top-ranked protein complexes confirm strong association with phenotype classes and not batch. Therefore, the constituent proteins of these complexes are more likely to be clinically relevant. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3490-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Wilson Wen Bin Goh
- School of Pharmaceutical Science and Technology, Tianjin University, 92 Weijin Road, Nankai District, Tianjin, 300072, People's Republic of China. .,Department of Computer Science, National University of Singapore, 13 Computing Drive, Singapore, 117417, Singapore.
| | - Limsoon Wong
- Department of Computer Science, National University of Singapore, 13 Computing Drive, Singapore, 117417, Singapore. .,Department of Pathology, National University of Singapore, Singapore, Singapore.
| |
Collapse
|
19
|
Lee PKM, Goh WWB, Sng JCG. Network-based characterization of the synaptic proteome reveals that removal of epigenetic regulator Prmt8 restricts proteins associated with synaptic maturation. J Neurochem 2017; 140:613-628. [PMID: 27935040 DOI: 10.1111/jnc.13921] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2016] [Revised: 11/30/2016] [Accepted: 12/04/2016] [Indexed: 12/13/2022]
Abstract
The brain adapts to dynamic environmental conditions by altering its epigenetic state, thereby influencing neuronal transcriptional programs. An example of an epigenetic modification is protein methylation, catalyzed by protein arginine methyltransferases (PRMT). One member, Prmt8, is selectively expressed in the central nervous system during a crucial phase of early development, but little else is known regarding its function. We hypothesize Prmt8 plays a role in synaptic maturation during development. To evaluate this, we used a proteome-wide approach to characterize the synaptic proteome of Prmt8 knockout versus wild-type mice. Through comparative network-based analyses, proteins and functional clusters related to neurite development were identified to be differentially regulated between the two genotypes. One interesting protein that was differentially regulated was tenascin-R (TNR). Chromatin immunoprecipitation demonstrated binding of PRMT8 to the tenascin-r (Tnr) promoter. TNR, a component of perineuronal nets, preserves structural integrity of synaptic connections within neuronal networks during the development of visual-somatosensory cortices. On closer inspection, Prmt8 removal increased net formation and decreased inhibitory parvalbumin-positive (PV+) puncta on pyramidal neurons, thereby hindering the maturation of circuits. Consequently, visual acuity of the knockout mice was reduced. Our results demonstrated Prmt8's involvement in synaptic maturation and its prospect as an epigenetic modulator of developmental neuroplasticity by regulating structural elements such as the perineuronal nets.
Collapse
Affiliation(s)
- Patrick Kia Ming Lee
- Integrative Neuroscience Program, Singapore Institute for Clinical Sciences, Agency for Science Technology and Research (A*STAR), Singapore.,Department of Pharmacology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore.,School of Pharmaceutical Science and Technology, Tianjin University, Tianjin, China
| | - Wilson Wen Bin Goh
- School of Pharmaceutical Science and Technology, Tianjin University, Tianjin, China.,Department of Computer Science, National University of Singapore, Singapore
| | - Judy Chia Ghee Sng
- Department of Pharmacology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| |
Collapse
|
20
|
Mustafin ZS, Lashin SA, Matushkin YG, Gunbin KV, Afonnikov DA. Orthoscape: a cytoscape application for grouping and visualization KEGG based gene networks by taxonomy and homology principles. BMC Bioinformatics 2017; 18:1427. [PMID: 28466792 PMCID: PMC5333177 DOI: 10.1186/s12859-016-1427-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
Background There are many available software tools for visualization and analysis of biological networks. Among them, Cytoscape (http://cytoscape.org/) is one of the most comprehensive packages, with many plugins and applications which extends its functionality by providing analysis of protein-protein interaction, gene regulatory and gene co-expression networks, metabolic, signaling, neural as well as ecological-type networks including food webs, communities networks etc. Nevertheless, only three plugins tagged ‘network evolution’ found in Cytoscape official app store and in literature. We have developed a new Cytoscape 3.0 application Orthoscape aimed to facilitate evolutionary analysis of gene networks and visualize the results. Results Orthoscape aids in analysis of evolutionary information available for gene sets and networks by highlighting: (1) the orthology relationships between genes; (2) the evolutionary origin of gene network components; (3) the evolutionary pressure mode (diversifying or stabilizing, negative or positive selection) of orthologous groups in general and/or branch-oriented mode. The distinctive feature of Orthoscape is the ability to control all data analysis steps via user-friendly interface. Conclusion Orthoscape allows its users to analyze gene networks or separated gene sets in the context of evolution. At each step of data analysis, Orthoscape also provides for convenient visualization and data manipulation. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1427-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | - Sergey Alexandrovich Lashin
- Institute of Cytology and Genetics SB RAS, Lavrentiev Avenue 10, Novosibirsk, 630090, Russia. .,Novosibirsk State University, Pirogova st. 2, Novosibirsk, 630090, Russia.
| | | | | | - Dmitry Arkadievich Afonnikov
- Institute of Cytology and Genetics SB RAS, Lavrentiev Avenue 10, Novosibirsk, 630090, Russia.,Novosibirsk State University, Pirogova st. 2, Novosibirsk, 630090, Russia
| |
Collapse
|
21
|
Goh WWB. Fuzzy-FishNET: a highly reproducible protein complex-based approach for feature selection in comparative proteomics. BMC Med Genomics 2016; 9:67. [PMID: 28117654 PMCID: PMC5260792 DOI: 10.1186/s12920-016-0228-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
Background The hypergeometric enrichment analysis approach typically fares poorly in feature-selection stability due to its upstream reliance on the t-test to generate differential protein lists before testing for enrichment on a protein complex, subnetwork or gene group. Methods Swapping the t-test in favour of a fuzzy rank-based weight system similar to that used in network-based methods like Quantitative Proteomics Signature Profiling (QPSP), Fuzzy SubNets (FSNET) and paired FSNET (PFSNET) produces dramatic improvements. Results This approach, Fuzzy-FishNET, exhibits high precision-recall over three sets of simulated data (with simulated protein complexes) while excelling in feature-selection reproducibility on real data (based on evaluation with real protein complexes). Overlap comparisons with PFSNET shows Fuzzy-FishNET selects the most significant complexes, which are also strongly class-discriminative. Cross-validation further demonstrates Fuzzy-FishNET selects class-relevant protein complexes. Conclusions Based on evaluation with simulated and real datasets, Fuzzy-FishNET is a significant upgrade of the traditional hypergeometric enrichment approach and a powerful new entrant amongst comparative proteomics analysis methods. Electronic supplementary material The online version of this article (doi:10.1186/s12920-016-0228-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Wilson Wen Bin Goh
- School of Pharmaceutical Science and Technology, Tianjin University, Tianjin, People's Republic of China.
| |
Collapse
|
22
|
Goh WWB, Wong L. Spectra-first feature analysis in clinical proteomics — A case study in renal cancer. J Bioinform Comput Biol 2016; 14:1644004. [DOI: 10.1142/s0219720016440042] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
In proteomics, useful signal may be unobserved or lost due to the lack of confident peptide-spectral matches. Selection of differential spectra, followed by associative peptide/protein mapping may be a complementary strategy for improving sensitivity and comprehensiveness of analysis (spectra-first paradigm). This approach is complementary to the standard approach where functional analysis is performed only on the finalized protein list assembled from identified peptides from the spectra (protein-first paradigm). Based on a case study of renal cancer, we introduce a simple spectra-binning approach, MZ-bin. We demonstrate that differential spectra feature selection using MZ-bin is class-discriminative and can trace relevant proteins via spectra associative mapping. Moreover, proteins identified in this manner are more biologically coherent than those selected directly from the finalized protein list. Analysis of constituent peptides per protein reveals high expression inconsistency, suggesting that the measured protein expressions are in fact, poor approximations of true protein levels. Moreover, analysis at the level of constituent peptides may provide higher resolution insight into the underlying biology: Via MZ-bin, we identified for the first time differential splice forms for the known renal cancer marker MAPT. We conclude that the spectra-first analysis paradigm is a complementary strategy to the traditional protein-first paradigm and can provide deeper level insight.
Collapse
Affiliation(s)
- Wilson Wen Bin Goh
- School of Pharmaceutical Science and Technology, Tianjin University, 92 Weijin Road, Tianjin 300072, P. R. China
| | - Limsoon Wong
- Department of Computer Science, National University of Singapore, 13 Computing Drive, 117417 Singapore
| |
Collapse
|
23
|
Fasano M, Monti C, Alberio T. A systems biology-led insight into the role of the proteome in neurodegenerative diseases. Expert Rev Proteomics 2016; 13:845-55. [PMID: 27477319 DOI: 10.1080/14789450.2016.1219254] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
INTRODUCTION Multifactorial disorders are the result of nonlinear interactions of several factors; therefore, a reductionist approach does not appear to be appropriate. Proteomics is a global approach that can be efficiently used to investigate pathogenetic mechanisms of neurodegenerative diseases. AREAS COVERED Here, we report a general introduction about the systems biology approach and mechanistic insights recently obtained by over-representation analysis of proteomics data of cellular and animal models of Alzheimer's disease, Parkinson's disease and other neurodegenerative disorders, as well as of affected human tissues. Expert commentary: As an inductive method, proteomics is based on unbiased observations that further require validation of generated hypotheses. Pathway databases and over-representation analysis tools allow researchers to assign an expectation value to pathogenetic mechanisms linked to neurodegenerative diseases. The systems biology approach based on omics data may be the key to unravel the complex mechanisms underlying neurodegeneration.
Collapse
Affiliation(s)
- Mauro Fasano
- a Department of Science and High Technology and Center of Neuroscience , University of Insubria , Busto Arsizio , Italy
| | - Chiara Monti
- a Department of Science and High Technology and Center of Neuroscience , University of Insubria , Busto Arsizio , Italy
| | - Tiziana Alberio
- a Department of Science and High Technology and Center of Neuroscience , University of Insubria , Busto Arsizio , Italy
| |
Collapse
|
24
|
Goh WWB, Wong L. Advancing Clinical Proteomics via Analysis Based on Biological Complexes: A Tale of Five Paradigms. J Proteome Res 2016; 15:3167-79. [DOI: 10.1021/acs.jproteome.6b00402] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Affiliation(s)
- Wilson Wen Bin Goh
- School
of Pharmaceutical Science and Technology, Tianjin University, 92 Weijin Road, Nankai District, Tianjin 300072, China
- Department
of Computer Science, National University of Singapore, 13 Computing
Drive, Singapore 117417
| | - Limsoon Wong
- Department
of Computer Science, National University of Singapore, 13 Computing
Drive, Singapore 117417
- Department
of Pathology, National University of Singapore, 5 Lower Kent Ridge Road, Singapore 117417
| |
Collapse
|
25
|
Goh WWB, Wong L. Evaluating feature-selection stability in next-generation proteomics. J Bioinform Comput Biol 2016; 14:1650029. [PMID: 27640811 DOI: 10.1142/s0219720016500293] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Identifying reproducible yet relevant features is a major challenge in biological research. This is well documented in genomics data. Using a proposed set of three reliability benchmarks, we find that this issue exists also in proteomics for commonly used feature-selection methods, e.g. [Formula: see text]-test and recursive feature elimination. Moreover, due to high test variability, selecting the top proteins based on [Formula: see text]-value ranks - even when restricted to high-abundance proteins - does not improve reproducibility. Statistical testing based on networks are believed to be more robust, but this does not always hold true: The commonly used hypergeometric enrichment that tests for enrichment of protein subnets performs abysmally due to its dependence on unstable protein pre-selection steps. We demonstrate here for the first time the utility of a novel suite of network-based algorithms called ranked-based network algorithms (RBNAs) on proteomics. These have originally been introduced and tested extensively on genomics data. We show here that they are highly stable, reproducible and select relevant features when applied to proteomics data. It is also evident from these results that use of statistical feature testing on protein expression data should be executed with due caution. Careless use of networks does not resolve poor-performance issues, and can even mislead. We recommend augmenting statistical feature-selection methods with concurrent analysis on stability and reproducibility to improve the quality of the selected features prior to experimental validation.
Collapse
Affiliation(s)
- Wilson Wen Bin Goh
- 1 School of Pharmaceutical Science and Technology, Tianjin University, 92 Weijin Road, Tianjin 300072, China.,2 Department of Computer Science, National University of Singapore, 13 Computing Drive, Singapore 117417 Singapore
| | - Limsoon Wong
- 1 School of Pharmaceutical Science and Technology, Tianjin University, 92 Weijin Road, Tianjin 300072, China.,2 Department of Computer Science, National University of Singapore, 13 Computing Drive, Singapore 117417 Singapore
| |
Collapse
|
26
|
Design principles for clinical network-based proteomics. Drug Discov Today 2016; 21:1130-8. [DOI: 10.1016/j.drudis.2016.05.013] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2015] [Revised: 04/18/2016] [Accepted: 05/20/2016] [Indexed: 01/10/2023]
|
27
|
Gao SG, Liu RM, Zhao YG, Wang P, Ward DG, Wang GC, Guo XQ, Gu J, Niu WB, Zhang T, Martin A, Guo ZP, Feng XS, Qi YJ, Ma YF. Integrative topological analysis of mass spectrometry data reveals molecular features with clinical relevance in esophageal squamous cell carcinoma. Sci Rep 2016; 6:21586. [PMID: 26898710 PMCID: PMC4761933 DOI: 10.1038/srep21586] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2015] [Accepted: 01/26/2016] [Indexed: 02/06/2023] Open
Abstract
Combining MS-based proteomic data with network and topological features of such network would identify more clinically relevant molecules and meaningfully expand the repertoire of proteins derived from MS analysis. The integrative topological indexes representing 95.96% information of seven individual topological measures of node proteins were calculated within a protein-protein interaction (PPI) network, built using 244 differentially expressed proteins (DEPs) identified by iTRAQ 2D-LC-MS/MS. Compared with DEPs, differentially expressed genes (DEGs) and comprehensive features (CFs), structurally dominant nodes (SDNs) based on integrative topological index distribution produced comparable classification performance in three different clinical settings using five independent gene expression data sets. The signature molecules of SDN-based classifier for distinction of early from late clinical TNM stages were enriched in biological traits of protein synthesis, intracellular localization and ribosome biogenesis, which suggests that ribosome biogenesis represents a promising therapeutic target for treating ESCC. In addition, ITGB1 expression selected exclusively by integrative topological measures correlated with clinical stages and prognosis, which was further validated with two independent cohorts of ESCC samples. Thus the integrative topological analysis of PPI networks proposed in this study provides an alternative approach to identify potential biomarkers and therapeutic targets from MS/MS data with functional insights in ESCC.
Collapse
Affiliation(s)
- She-Gan Gao
- Henan Key Laboratory of Cancer Epigenetics, Cancer Institute, The First Affiliated Hospital, College of Clinical Medicine, Henan University of Science and Technology, Luoyang, P. R. China, 471003
| | - Rui-Min Liu
- Henan Key Laboratory of Engineering Antibody Medicine, Henan International United Laboratory of Antibody Medicine, Key Laboratory of Cellular and Molecular Immunology, Henan University School of Medicine, Kaifeng 475004, P.R. China
| | - Yun-Gang Zhao
- Henan Key Laboratory of Engineering Antibody Medicine, Henan International United Laboratory of Antibody Medicine, Key Laboratory of Cellular and Molecular Immunology, Henan University School of Medicine, Kaifeng 475004, P.R. China
| | - Pei Wang
- School of Mathematics and Statistics, Henan University, Kaifeng, China, Henan 475004, P. R. China
| | - Douglas G. Ward
- School of Cancer Sciences, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
| | - Guang-Chao Wang
- Henan Key Laboratory of Engineering Antibody Medicine, Henan International United Laboratory of Antibody Medicine, Key Laboratory of Cellular and Molecular Immunology, Henan University School of Medicine, Kaifeng 475004, P.R. China
| | - Xiang-Qian Guo
- Henan Key Laboratory of Engineering Antibody Medicine, Henan International United Laboratory of Antibody Medicine, Key Laboratory of Cellular and Molecular Immunology, Henan University School of Medicine, Kaifeng 475004, P.R. China
| | - Juan Gu
- Henan Key Laboratory of Engineering Antibody Medicine, Henan International United Laboratory of Antibody Medicine, Key Laboratory of Cellular and Molecular Immunology, Henan University School of Medicine, Kaifeng 475004, P.R. China
| | - Wan-Bin Niu
- Henan Key Laboratory of Engineering Antibody Medicine, Henan International United Laboratory of Antibody Medicine, Key Laboratory of Cellular and Molecular Immunology, Henan University School of Medicine, Kaifeng 475004, P.R. China
| | - Tian Zhang
- Henan Key Laboratory of Engineering Antibody Medicine, Henan International United Laboratory of Antibody Medicine, Key Laboratory of Cellular and Molecular Immunology, Henan University School of Medicine, Kaifeng 475004, P.R. China
| | - Ashley Martin
- School of Cancer Sciences, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
| | - Zhi-Peng Guo
- Henan Key Laboratory of Engineering Antibody Medicine, Henan International United Laboratory of Antibody Medicine, Key Laboratory of Cellular and Molecular Immunology, Henan University School of Medicine, Kaifeng 475004, P.R. China
| | - Xiao-Shan Feng
- Henan Key Laboratory of Cancer Epigenetics, Cancer Institute, The First Affiliated Hospital, College of Clinical Medicine, Henan University of Science and Technology, Luoyang, P. R. China, 471003
| | - Yi-Jun Qi
- Henan Key Laboratory of Engineering Antibody Medicine, Henan International United Laboratory of Antibody Medicine, Key Laboratory of Cellular and Molecular Immunology, Henan University School of Medicine, Kaifeng 475004, P.R. China
| | - Yuan-Fang Ma
- Henan Key Laboratory of Engineering Antibody Medicine, Henan International United Laboratory of Antibody Medicine, Key Laboratory of Cellular and Molecular Immunology, Henan University School of Medicine, Kaifeng 475004, P.R. China
| |
Collapse
|
28
|
Ansari-Pour N, Razaghi-Moghadam Z, Barneh F, Jafari M. Testis-Specific Y-Centric Protein-Protein Interaction Network Provides Clues to the Etiology of Severe Spermatogenic Failure. J Proteome Res 2016; 15:1011-22. [PMID: 26794825 DOI: 10.1021/acs.jproteome.5b01080] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
Pinpointing causal genes for spermatogenic failure (SpF) on the Y chromosome has been an ever daunting challenge with setbacks during the past decade. Since complex diseases result from the interaction of multiple genes and also display considerable missing heritability, network analysis is more likely to explicate an etiological molecular basis. We therefore took a network medicine approach by integrating interactome (protein-protein interaction (PPI)) and transcriptome data to reconstruct a Y-centric SpF network. Two sets of seed genes (Y genes and SpF-implicated genes (SIGs)) were used for network reconstruction. Since no PPI was observed among Y genes, we identified their common immediate interactors. Interestingly, 81% (N = 175) of these interactors not only interacted directly with SIGs, but also they were enriched for differentially expressed genes (89.6%; N = 43). The SpF network, formed mainly by the dys-regulated interactors and the two seed gene sets, comprised three modules enriched for ribosomal proteins and nuclear receptors for sex hormones. Ribosomal proteins generally showed significant dys-regulation with RPL39L, thought to be expressed at the onset of spermatogenesis, strongly down-regulated. This network is the first global PPI network pertaining to severe SpF and if experimentally validated on independent data sets can lead to more accurate diagnosis and potential fertility recovery of patients.
Collapse
Affiliation(s)
- Naser Ansari-Pour
- Faculty of New Sciences and Technology, University of Tehran , North Kargar Street, Tehran 143995-7131, Iran.,School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM) , Tehran 19395-5531, Iran
| | - Zahra Razaghi-Moghadam
- Faculty of New Sciences and Technology, University of Tehran , North Kargar Street, Tehran 143995-7131, Iran.,School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM) , Tehran 19395-5531, Iran
| | - Farnaz Barneh
- Faculty of Paramedical Sciences, Shahid Beheshti University of Medical Sciences , Tehran 198396-3113, Iran
| | - Mohieddin Jafari
- Drug Design and Bioinformatics Unit, Medical Biotechnology Department, Biotechnology Research Center, Pasteur Institute of Iran , Tehran 131694-3551, Iran.,School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM) , Tehran 19395-5531, Iran
| |
Collapse
|
29
|
Lin LL, Hsu CL, Hu CW, Ko SY, Hsieh HL, Huang HC, Juan HF. Integrating Phosphoproteomics and Bioinformatics to Study Brassinosteroid-Regulated Phosphorylation Dynamics in Arabidopsis. BMC Genomics 2015; 16:533. [PMID: 26187819 PMCID: PMC4506601 DOI: 10.1186/s12864-015-1753-4] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2014] [Accepted: 07/06/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Protein phosphorylation regulated by plant hormone is involved in the coordination of fundamental plant development. Brassinosteroids (BRs), a group of phytohormones, regulated phosphorylation dynamics remains to be delineated in plants. In this study, we performed a mass spectrometry (MS)-based phosphoproteomics to conduct a global and dynamic phosphoproteome profiling across five time points of BR treatment in the period between 5 min and 12 h. MS coupling with phosphopeptide enrichment techniques has become the powerful tool for profiling protein phosphorylation. However, MS-based methods tend to have data consistency and coverage issues. To address these issues, bioinformatics approaches were used to complement the non-detected proteins and recover the dynamics of phosphorylation events. RESULTS A total of 1104 unique phosphorylated peptides from 739 unique phosphoproteins were identified. The time-dependent gene ontology (GO) analysis shows the transition of biological processes from signaling transduction to morphogenesis and stress response. The protein-protein interaction analysis found that most of identified phosphoproteins have strongly connections with known BR signaling components. The analysis by using Motif-X was performed to identify 15 enriched motifs, 11 of which correspond to 6 known kinase families. To uncover the dynamic activities of kinases, the enriched motifs were combined with phosphorylation profiles and revealed that the substrates of casein kinase 2 and mitogen-activated protein kinase were significantly phosphorylated and dephosphorylated at initial time of BR treatment, respectively. The time-dependent kinase-substrate interaction networks were constructed and showed many substrates are the downstream of other signals, such as auxin and ABA signaling. While comparing BR responsive phosphoproteome and gene expression data, we found most of phosphorylation changes were not led by gene expression changes. Our results suggested many downstream proteins of BR signaling are induced by phosphorylation via various kinases, not through transcriptional regulation. CONCLUSIONS Through a large-scale dynamic profile of phosphoproteome coupled with bioinformatics, a complicated kinase-centered network related to BR-regulated growth was deciphered. The phosphoproteins and phosphosites identified in our study provide a useful dataset for revealing signaling networks of BR regulation, and also expanded our knowledge of protein phosphorylation modification in plants as well as further deal to solve the plant growth problems.
Collapse
Affiliation(s)
- Li-Ling Lin
- Department of Life Science, National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei, 106, Taiwan.
| | - Chia-Lang Hsu
- Department of Life Science, National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei, 106, Taiwan.
| | - Chia-Wei Hu
- Institute of Molecular and Cellular Biology, National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei, 106, Taiwan.
| | - Shiao-Yun Ko
- Institute of Molecular and Cellular Biology, National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei, 106, Taiwan.
| | - Hsu-Liang Hsieh
- Institute of Plant Biology, National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei, 106, Taiwan.
| | - Hsuan-Cheng Huang
- Institute of Biomedical Informatics, Center for Systems and Synthetic Biology, National Yang-Ming University, No.155, Sec.2, Linong Street, Taipei, 112, Taiwan.
| | - Hsueh-Fen Juan
- Department of Life Science, National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei, 106, Taiwan. .,Institute of Molecular and Cellular Biology, National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei, 106, Taiwan. .,Graduate Institute of Biomedical Electronic and Bioinformatics, National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei, 106, Taiwan.
| |
Collapse
|
30
|
Webb-Robertson BJM, Matzke MM, Datta S, Payne SH, Kang J, Bramer LM, Nicora CD, Shukla AK, Metz TO, Rodland KD, Smith RD, Tardiff MF, McDermott JE, Pounds JG, Waters KM. Bayesian proteoform modeling improves protein quantification of global proteomic measurements. Mol Cell Proteomics 2015; 13:3639-46. [PMID: 25433089 DOI: 10.1074/mcp.m113.030932] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
As the capability of mass spectrometry-based proteomics has matured, tens of thousands of peptides can be measured simultaneously, which has the benefit of offering a systems view of protein expression. However, a major challenge is that, with an increase in throughput, protein quantification estimation from the native measured peptides has become a computational task. A limitation to existing computationally driven protein quantification methods is that most ignore protein variation, such as alternate splicing of the RNA transcript and post-translational modifications or other possible proteoforms, which will affect a significant fraction of the proteome. The consequence of this assumption is that statistical inference at the protein level, and consequently downstream analyses, such as network and pathway modeling, have only limited power for biomarker discovery. Here, we describe a Bayesian Proteoform Quantification model (BP-Quant)(1) that uses statistically derived peptides signatures to identify peptides that are outside the dominant pattern or the existence of multiple overexpressed patterns to improve relative protein abundance estimates. It is a research-driven approach that utilizes the objectives of the experiment, defined in the context of a standard statistical hypothesis, to identify a set of peptides exhibiting similar statistical behavior relating to a protein. This approach infers that changes in relative protein abundance can be used as a surrogate for changes in function, without necessarily taking into account the effect of differential post-translational modifications, processing, or splicing in altering protein function. We verify the approach using a dilution study from mouse plasma samples and demonstrate that BP-Quant achieves similar accuracy as the current state-of-the-art methods at proteoform identification with significantly better specificity. BP-Quant is available as a MatLab® and R packages.
Collapse
Affiliation(s)
- Bobbie-Jo M Webb-Robertson
- From the ‡Applied Statistics and Computational Modeling, Pacific Northwest National Laboratory, Richland, WA 99354;
| | - Melissa M Matzke
- §Computational Biology & Bioinformatics, Pacific Northwest National Laboratory, Richland, WA 99354
| | - Susmita Datta
- ¶Bioinformatics and Biostatistics, University of Louisville, Louisville, KY 40202
| | - Samuel H Payne
- ‖Omics Technology Development and Production, Pacific Northwest National Laboratory, Richland, WA 99354
| | - Jiyun Kang
- ‖Omics Technology Development and Production, Pacific Northwest National Laboratory, Richland, WA 99354
| | - Lisa M Bramer
- From the ‡Applied Statistics and Computational Modeling, Pacific Northwest National Laboratory, Richland, WA 99354
| | - Carrie D Nicora
- ‖Omics Technology Development and Production, Pacific Northwest National Laboratory, Richland, WA 99354
| | - Anil K Shukla
- ‖Omics Technology Development and Production, Pacific Northwest National Laboratory, Richland, WA 99354
| | - Thomas O Metz
- ¶¶Omics Biological Applications, Pacific Northwest National Laboratory, Richland, WA 99354
| | - Karin D Rodland
- ‡‡Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99354
| | - Richard D Smith
- ‡‡Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99354
| | - Mark F Tardiff
- From the ‡Applied Statistics and Computational Modeling, Pacific Northwest National Laboratory, Richland, WA 99354
| | - Jason E McDermott
- §Computational Biology & Bioinformatics, Pacific Northwest National Laboratory, Richland, WA 99354
| | - Joel G Pounds
- ‡‡Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99354
| | - Katrina M Waters
- ‡‡Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99354
| |
Collapse
|
31
|
Webb-Robertson BJM, Wiberg HK, Matzke MM, Brown JN, Wang J, McDermott JE, Smith RD, Rodland KD, Metz TO, Pounds JG, Waters KM. Review, evaluation, and discussion of the challenges of missing value imputation for mass spectrometry-based label-free global proteomics. J Proteome Res 2015; 14:1993-2001. [PMID: 25855118 DOI: 10.1021/pr501138h] [Citation(s) in RCA: 167] [Impact Index Per Article: 18.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
In this review, we apply selected imputation strategies to label-free liquid chromatography-mass spectrometry (LC-MS) proteomics datasets to evaluate the accuracy with respect to metrics of variance and classification. We evaluate several commonly used imputation approaches for individual merits and discuss the caveats of each approach with respect to the example LC-MS proteomics data. In general, local similarity-based approaches, such as the regularized expectation maximization and least-squares adaptive algorithms, yield the best overall performances with respect to metrics of accuracy and robustness. However, no single algorithm consistently outperforms the remaining approaches, and in some cases, performing classification without imputation sometimes yielded the most accurate classification. Thus, because of the complex mechanisms of missing data in proteomics, which also vary from peptide to protein, no individual method is a single solution for imputation. On the basis of the observations in this review, the goal for imputation in the field of computational proteomics should be to develop new approaches that work generically for this data type and new strategies to guide users in the selection of the best imputation for their dataset and analysis objectives.
Collapse
Affiliation(s)
| | - Holli K Wiberg
- Pacific Northwest National Laboratory, PO BOX 999, K7-20, Richland, Washington 99352, United States
| | - Melissa M Matzke
- Pacific Northwest National Laboratory, PO BOX 999, K7-20, Richland, Washington 99352, United States
| | - Joseph N Brown
- Pacific Northwest National Laboratory, PO BOX 999, K7-20, Richland, Washington 99352, United States
| | - Jing Wang
- Pacific Northwest National Laboratory, PO BOX 999, K7-20, Richland, Washington 99352, United States
| | - Jason E McDermott
- Pacific Northwest National Laboratory, PO BOX 999, K7-20, Richland, Washington 99352, United States
| | - Richard D Smith
- Pacific Northwest National Laboratory, PO BOX 999, K7-20, Richland, Washington 99352, United States
| | - Karin D Rodland
- Pacific Northwest National Laboratory, PO BOX 999, K7-20, Richland, Washington 99352, United States
| | - Thomas O Metz
- Pacific Northwest National Laboratory, PO BOX 999, K7-20, Richland, Washington 99352, United States
| | - Joel G Pounds
- Pacific Northwest National Laboratory, PO BOX 999, K7-20, Richland, Washington 99352, United States
| | - Katrina M Waters
- Pacific Northwest National Laboratory, PO BOX 999, K7-20, Richland, Washington 99352, United States
| |
Collapse
|
32
|
Laukens K, Naulaerts S, Berghe WV. Bioinformatics approaches for the functional interpretation of protein lists: from ontology term enrichment to network analysis. Proteomics 2015; 15:981-96. [PMID: 25430566 DOI: 10.1002/pmic.201400296] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2014] [Revised: 10/16/2014] [Accepted: 11/24/2014] [Indexed: 12/24/2022]
Abstract
The main result of a great deal of the published proteomics studies is a list of identified proteins, which then needs to be interpreted in relation to the research question and existing knowledge. In the early days of proteomics this interpretation was only based on expert insights, acquired by digesting a large amount of relevant literature. With the growing size and complexity of the experimental datasets, many computational techniques, databases, and tools have claimed a central role in this task. In this review we discuss commonly and less commonly used methods to functionally interpret experimental proteome lists and compare them with available knowledge. We first address several functional analysis and enrichment techniques based on ontologies and literature. Then we outline how various types of network and pathway information can be used. While the problem of functional interpretation of proteome data is to an extent equivalent to the interpretation of transcriptome or other ''omics'' data, this paper addresses some of the specific challenges and solutions of the proteomics field.
Collapse
Affiliation(s)
- Kris Laukens
- Department of Mathematics and Computer Science, University of Antwerp, Middelheimlaan, Antwerp, Belgium; Biomedical Informatics Research Center Antwerp (biomina), University of Antwerp / Antwerp University Hospital, Antwerp, Belgium
| | | | | |
Collapse
|
33
|
Nandal UK, Vlietstra WJ, Byrman C, Jeeninga RE, Ringrose JH, van Kampen AHC, Speijer D, Moerland PD. Candidate prioritization for low-abundant differentially expressed proteins in 2D-DIGE datasets. BMC Bioinformatics 2015; 16:25. [PMID: 25627479 PMCID: PMC4384356 DOI: 10.1186/s12859-015-0455-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2014] [Accepted: 01/09/2015] [Indexed: 01/17/2023] Open
Abstract
Background Two-dimensional differential gel electrophoresis (2D-DIGE) provides a powerful technique to separate proteins on their isoelectric point and apparent molecular mass and quantify changes in protein expression. Abundantly available proteins in spots can be identified using mass spectrometry-based approaches. However, identification is often not possible for low-abundant proteins. Results We present a novel computational approach to prioritize candidate proteins for unidentified spots. Our approach exploits noisy information on the isoelectric point and apparent molecular mass of a protein spot in combination with functional similarities of candidate proteins to already identified proteins to select and rank candidates. We evaluated our method on a 2D-DIGE dataset comparing protein expression in uninfected and HIV-1 infected T-cells. Using leave-one-out cross-validation, we show that the true-positive rate for the top-5 ranked proteins is 43.8%. Conclusions Our approach shows good performance on a 2D-DIGE dataset comparing protein expression in uninfected and HIV-1 infected T-cells. We expect our method to be highly useful in (re-)mining other 2D-DIGE experiments in which especially the low-abundant protein spots remain to be identified. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0455-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Umesh K Nandal
- Bioinformatics Laboratory, Academic Medical Center, University of Amsterdam, PO Box 22700, DE Amsterdam, 1100, The Netherlands.
| | - Wytze J Vlietstra
- Bioinformatics Laboratory, Academic Medical Center, University of Amsterdam, PO Box 22700, DE Amsterdam, 1100, The Netherlands.
| | - Carsten Byrman
- Bioinformatics Laboratory, Academic Medical Center, University of Amsterdam, PO Box 22700, DE Amsterdam, 1100, The Netherlands.
| | - Rienk E Jeeninga
- Laboratory of Experimental Virology, Department of Medical Microbiology, Center for Infection and Immunity Amsterdam (CINIMA), Academic Medical Center, University of Amsterdam, PO Box 22700, DE Amsterdam, 1100, The Netherlands.
| | - Jeffrey H Ringrose
- Laboratory of Experimental Virology, Department of Medical Microbiology, Center for Infection and Immunity Amsterdam (CINIMA), Academic Medical Center, University of Amsterdam, PO Box 22700, DE Amsterdam, 1100, The Netherlands.
| | - Antoine H C van Kampen
- Bioinformatics Laboratory, Academic Medical Center, University of Amsterdam, PO Box 22700, DE Amsterdam, 1100, The Netherlands. .,Biosystems Data Analysis Group, University of Amsterdam, Science Park 9041098, XH Amsterdam, The Netherlands.
| | - Dave Speijer
- Department of Medical Biochemistry, Academic Medical Center, University of Amsterdam, PO Box 22700, DE Amsterdam, 1100, The Netherlands.
| | - Perry D Moerland
- Bioinformatics Laboratory, Academic Medical Center, University of Amsterdam, PO Box 22700, DE Amsterdam, 1100, The Netherlands.
| |
Collapse
|
34
|
Sun J, Zhang GL, Li S, Ivanov AR, Fenyo D, Lisacek F, Murthy SK, Karger BL, Brusic V. Pathway analysis and transcriptomics improve protein identification by shotgun proteomics from samples comprising small number of cells--a benchmarking study. BMC Genomics 2014; 15 Suppl 9:S1. [PMID: 25521637 PMCID: PMC4290587 DOI: 10.1186/1471-2164-15-s9-s1] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
BACKGROUND Proteomics research is enabled with the high-throughput technologies, but our ability to identify expressed proteome is limited in small samples. The coverage and consistency of proteome expression are critical problems in proteomics. Here, we propose pathway analysis and combination of microproteomics and transcriptomics analyses to improve mass-spectrometry protein identification from small size samples. RESULTS Multiple proteomics runs using MCF-7 cell line detected 4,957 expressed proteins. About 80% of expressed proteins were present in MCF-7 transcripts data; highly expressed transcripts are more likely to have expressed proteins. Approximately 1,000 proteins were detected in each run of the small sample proteomics. These proteins were mapped to gene symbols and compared with gene sets representing canonical pathways, more than 4,000 genes were extracted from the enriched gene sets. The identified canonical pathways were largely overlapping between individual runs. Of identified pathways 182 were shared between three individual small sample runs. CONCLUSIONS Current technologies enable us to directly detect 10% of expressed proteomes from small sample comprising as few as 50 cells. We used knowledge-based approaches to elucidate the missing proteome that can be verified by targeted proteomics. This knowledge-based approach includes pathway analysis and combination of gene expression and protein expression data for target prioritization. Genes present in both the enriched gene sets (canonical pathways collection) and in small sample proteomics data correspond to approximately 50% of expressed proteomes in larger sample proteomics data. In addition, 90% of targets from canonical pathways were estimated to be expressed. The comparison of proteomics and transcriptomics data, suggests that highly expressed transcripts have high probability of protein expression. However, approximately 10% of expressed proteins could not be matched with the expressed transcripts.
Collapse
|
35
|
A derived network-based interferon-related signature of human macrophages responding to Mycobacterium tuberculosis. BIOMED RESEARCH INTERNATIONAL 2014; 2014:713071. [PMID: 25371902 PMCID: PMC4209755 DOI: 10.1155/2014/713071] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/23/2014] [Revised: 07/09/2014] [Accepted: 07/11/2014] [Indexed: 12/11/2022]
Abstract
Network analysis of transcriptional signature typically relies on direct interaction between two highly expressed genes. However, this approach misses indirect and biological relevant interactions through a third factor (hub). Here we determine whether a hub-based network analysis can select an improved signature subset that correlates with a biological change in a stronger manner than the original signature. We have previously reported an interferon-related transcriptional signature (THP1r2Mtb-induced) from Mycobacterium tuberculosis (M. tb)-infected THP-1 human macrophage. We selected hub-connected THP1r2Mtb-induced genes into the refined network signature TMtb-iNet and grouped the excluded genes into the excluded signature TMtb-iEx. TMtb-iNet retained the enrichment of binding sites of interferon-related transcription factors and contained relatively more interferon-related interacting genes when compared to THP1r2Mtb-induced signature. TMtb-iNet correlated as strongly as THP1r2Mtb-induced signature on a public transcriptional dataset of patients with pulmonary tuberculosis (PTB). TMtb-iNet correlated more strongly in CD4(+) and CD8(+) T cells from PTB patients than THP1r2Mtb-induced signature and TMtb-iEx. When TMtb-iNet was applied to data during clinical therapy of tuberculosis, it resulted in the most pronounced response and the weakest correlation. Correlation on dataset from patients with AIDS or malaria was stronger for TMtb-iNet, indicating an involvement of TMtb-iNet in these chronic human infections. Collectively, the significance of this work is twofold: (1) we disseminate a hub-based approach in generating a biologically meaningful and clinically useful signature; (2) using this approach we introduce a new network-based signature and demonstrate its promising applications in understanding host responses to infections.
Collapse
|
36
|
Haenen S, Clynen E, Nemery B, Hoet PH, Vanoirbeek JA. Biomarker discovery in asthma and COPD: Application of proteomics techniques in human and mice. EUPA OPEN PROTEOMICS 2014. [DOI: 10.1016/j.euprot.2014.04.008] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
|
37
|
Pathway and network analysis in proteomics. J Theor Biol 2014; 362:44-52. [PMID: 24911777 DOI: 10.1016/j.jtbi.2014.05.031] [Citation(s) in RCA: 68] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2014] [Revised: 05/15/2014] [Accepted: 05/21/2014] [Indexed: 12/14/2022]
Abstract
Proteomics is inherently a systems science that studies not only measured protein and their expressions in a cell, but also the interplay of proteins, protein complexes, signaling pathways, and network modules. There is a rapid accumulation of Proteomics data in recent years. However, Proteomics data are highly variable, with results sensitive to data preparation methods, sample condition, instrument types, and analytical methods. To address the challenge in Proteomics data analysis, we review current tools being developed to incorporate biological function and network topological information. We categorize these tools into four types: tools with basic functional information and little topological features (e.g., GO category analysis), tools with rich functional information and little topological features (e.g., GSEA), tools with basic functional information and rich topological features (e.g., Cytoscape), and tools with rich functional information and rich topological features (e.g., PathwayExpress). We first review the potential application of these tools to Proteomics; then we review tools that can achieve automated learning of pathway modules and features, and tools that help perform integrated network visual analytics.
Collapse
|
38
|
Boyanova D, Nilla S, Klau GW, Dandekar T, Müller T, Dittrich M. Functional module search in protein networks based on semantic similarity improves the analysis of proteomics data. Mol Cell Proteomics 2014; 13:1877-89. [PMID: 24807868 DOI: 10.1074/mcp.m113.032839] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
The continuously evolving field of proteomics produces increasing amounts of data while improving the quality of protein identifications. Albeit quantitative measurements are becoming more popular, many proteomic studies are still based on non-quantitative methods for protein identification. These studies result in potentially large sets of identified proteins, where the biological interpretation of proteins can be challenging. Systems biology develops innovative network-based methods, which allow an integrated analysis of these data. Here we present a novel approach, which combines prior knowledge of protein-protein interactions (PPI) with proteomics data using functional similarity measurements of interacting proteins. This integrated network analysis exactly identifies network modules with a maximal consistent functional similarity reflecting biological processes of the investigated cells. We validated our approach on small (H9N2 virus-infected gastric cells) and large (blood constituents) proteomic data sets. Using this novel algorithm, we identified characteristic functional modules in virus-infected cells, comprising key signaling proteins (e.g. the stress-related kinase RAF1) and demonstrate that this method allows a module-based functional characterization of cell types. Analysis of a large proteome data set of blood constituents resulted in clear separation of blood cells according to their developmental origin. A detailed investigation of the T-cell proteome further illustrates how the algorithm partitions large networks into functional subnetworks each representing specific cellular functions. These results demonstrate that the integrated network approach not only allows a detailed analysis of proteome networks but also yields a functional decomposition of complex proteomic data sets and thereby provides deeper insights into the underlying cellular processes of the investigated system.
Collapse
Affiliation(s)
- Desislava Boyanova
- From the ‡Department of Bioinformatics, Biocenter, University of Würzburg, Am Hubland, D-97074 Würzburg, Germany
| | - Santosh Nilla
- From the ‡Department of Bioinformatics, Biocenter, University of Würzburg, Am Hubland, D-97074 Würzburg, Germany
| | - Gunnar W Klau
- §Life Sciences, Centrum Wiskunde & Informatica (CWI), Science Park 123, 1098 XG Amsterdam, The Netherlands
| | - Thomas Dandekar
- From the ‡Department of Bioinformatics, Biocenter, University of Würzburg, Am Hubland, D-97074 Würzburg, Germany
| | - Tobias Müller
- From the ‡Department of Bioinformatics, Biocenter, University of Würzburg, Am Hubland, D-97074 Würzburg, Germany
| | - Marcus Dittrich
- From the ‡Department of Bioinformatics, Biocenter, University of Würzburg, Am Hubland, D-97074 Würzburg, Germany;
| |
Collapse
|
39
|
GeneSense: a new approach for human gene annotation integrated with protein-protein interaction networks. Sci Rep 2014; 4:4474. [PMID: 24667292 PMCID: PMC3966033 DOI: 10.1038/srep04474] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2013] [Accepted: 03/10/2014] [Indexed: 12/29/2022] Open
Abstract
Virtually all cellular functions involve protein-protein interactions (PPIs). As an increasing number of PPIs are identified and vast amount of information accumulated, researchers are finding different ways to interrogate the data and understand the interactions in context. However, it is widely recognized that a significant portion of the data is scattered, redundant, not considered high quality, and not readily accessible to researchers in a systematic fashion. In addition, it is challenging to identify the optimal protein targets in the current PPI networks. The GeneSense server was developed to integrate gene annotation and PPI networks in an expandable architecture that incorporates selected databases with the aim to assemble, analyze, evaluate and disseminate protein-protein association information in a comprehensive and user-friendly manner. Three network models including nodenet, leafnet and loopnet are used to identify the optimal protein targets in the complex networks. GeneSense is freely available at www.biomedsense.org/genesense.php.
Collapse
|
40
|
Goh WWB, Wong L. Computational proteomics: designing a comprehensive analytical strategy. Drug Discov Today 2014; 19:266-74. [DOI: 10.1016/j.drudis.2013.07.008] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2013] [Revised: 06/28/2013] [Accepted: 07/11/2013] [Indexed: 02/02/2023]
|
41
|
Contemporary network proteomics and its requirements. BIOLOGY 2013; 3:22-38. [PMID: 24833333 PMCID: PMC4009760 DOI: 10.3390/biology3010022] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/12/2013] [Revised: 12/15/2013] [Accepted: 12/16/2013] [Indexed: 01/10/2023]
Abstract
The integration of networks with genomics (network genomics) is a familiar field. Conventional network analysis takes advantage of the larger coverage and relative stability of gene expression measurements. Network proteomics on the other hand has to develop further on two critical factors: (1) expanded data coverage and consistency, and (2) suitable reference network libraries, and data mining from them. Concerning (1) we discuss several contemporary themes that can improve data quality, which in turn will boost the outcome of downstream network analysis. For (2), we focus on network analysis developments, specifically, the need for context-specific networks and essential considerations for localized network analysis.
Collapse
|
42
|
Affiliation(s)
- Dirk Benndorf
- Department of Bioprocess Engineering; Otto von Guericke University Magdeburg; Magdeburg Germany
| | - Udo Reichl
- Department of Bioprocess Engineering; Otto von Guericke University Magdeburg; Magdeburg Germany
- Department of Bioprocess Engineering; Max Planck Institute for Dynamics of Complex Technical Systems; Magdeburg Germany
| |
Collapse
|
43
|
Muth T, Benndorf D, Reichl U, Rapp E, Martens L. Searching for a needle in a stack of needles: challenges in metaproteomics data analysis. MOLECULAR BIOSYSTEMS 2013; 9:578-85. [PMID: 23238088 DOI: 10.1039/c2mb25415h] [Citation(s) in RCA: 62] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
In the past years the integral study of microbial communities of varying complexity has gained increasing research interest. Mass spectrometry-driven metaproteomics enables the analysis of such communities on the functional level, but this fledgling field still faces various technical and semantic challenges regarding experimental data analysis and interpretation. In the present review, we outline the hurdles involved and attempt to cover the most valuable methods and software implementations available to researchers in the field today. Beyond merely focusing on protein identification, we provide an overview on different data pre- and post-processing steps, such as metabolic pathway analysis, that can be useful in a typical metaproteomics workflow. Finally, we briefly discuss directions for future work.
Collapse
Affiliation(s)
- Thilo Muth
- Max Planck Institute for Dynamics of Complex Technical Systems, Bioprocess Engineering, Magdeburg, Germany
| | | | | | | | | |
Collapse
|
44
|
Cui L, Lee YH, Kumar Y, Xu F, Lu K, Ooi EE, Tannenbaum SR, Ong CN. Serum metabolome and lipidome changes in adult patients with primary dengue infection. PLoS Negl Trop Dis 2013; 7:e2373. [PMID: 23967362 PMCID: PMC3744433 DOI: 10.1371/journal.pntd.0002373] [Citation(s) in RCA: 108] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2013] [Accepted: 07/02/2013] [Indexed: 12/22/2022] Open
Abstract
Background Dengue virus (DENV) is the most widespread arbovirus with an estimated 100 million infections occurring every year. Endemic in the tropical and subtropical areas of the world, dengue fever/dengue hemorrhagic fever (DF/DHF) is emerging as a major public health concern. The complex array of concurrent host physiologic changes has hampered a complete understanding of underlying molecular mechanisms of dengue pathogenesis. Methodology/Principle Findings Systems level characterization of serum metabolome and lipidome of adult DF patients at early febrile, defervescence, and convalescent stages of DENV infection was performed using liquid chromatography- and gas chromatography-mass spectrometry. The tractability of following metabolite and lipid changes in a relatively large sample size (n = 44) across three prominent infection stages allowed the identification of critical physiologic changes that coincided with the different stages. Sixty differential metabolites were identified in our metabolomics analysis and the main metabolite classes were free fatty acids, acylcarnitines, phospholipids, and amino acids. Major perturbed metabolic pathways included fatty acid biosynthesis and β-oxidation, phospholipid catabolism, steroid hormone pathway, etc., suggesting the multifactorial nature of human host responses. Analysis of phospholipids and sphingolipids verified the temporal trends and revealed association with lymphocytes and platelets numbers. These metabolites were significantly perturbed during the early stages, and normalized to control levels at convalescent stage, suggesting their potential utility as prognostic markers. Conclusions/Significance DENV infection causes temporally distinct serum metabolome and lipidome changes, and many of the differential metabolites are involved in acute inflammatory responses. Our global analyses revealed early anti-inflammatory responses working in concert to modulate early pro-inflammatory processes, thus preventing the host from development of pathologies by excessive or prolonged inflammation. This study is the first example of how an omic- approach can divulge the extensive, concurrent, and dynamic host responses elicited by DENV and offers plausible physiological insights to why DF is self limiting. Dengue virus is the most widespread arbovirus and a major public health threat in the tropical and subtropical areas of the world. As yet, little is known about the molecular mechanisms underlying infection, and there is no specific treatment or vaccine that is currently effective against the disease. Metabolomics and lipidomics provide global views of metabolome and lipidome landscapes and implicate metabolic to disease phenotype. We performed serum metabolic and lipidomic profiling on a cohort of dengue patients with three sampling time points at early febrile, defervescence, and convalescent stages via mass spectrometry-based analytical platforms. Compared with healthy subjects, approximately two hundred metabolites showed significant difference in dengue patients, and 60 were identified. This study revealed that in primary dengue infection, the host metabolome is tightly regulated, with active, early anti-inflammatory processes modulating the pro-inflammatory processes, suggesting the self-limiting phenotype of dengue fever. Major perturbed metabolic pathways included fatty acid biosynthesis, fatty acid β-oxidation, phospholipid catabolism, steroid hormone pathway, etc. This represents a first report on the characterization of the serum metabolome and significantly advances our understanding on host and dengue virus interactions. These differential metabolites have the potential as biomarkers for disease monitoring and evaluation of therapeutic interventions.
Collapse
Affiliation(s)
- Liang Cui
- Interdisciplinary Research Group in Infectious Diseases, Singapore-MIT Alliance for Research & Technology (SMART), Singapore
| | - Yie Hou Lee
- Interdisciplinary Research Group in Infectious Diseases, Singapore-MIT Alliance for Research & Technology (SMART), Singapore
| | - Yadunanda Kumar
- Interdisciplinary Research Group in Infectious Diseases, Singapore-MIT Alliance for Research & Technology (SMART), Singapore
| | - Fengguo Xu
- Saw Swee Hock School of Public Health, National University of Singapore, Singapore
| | - Kun Lu
- Departments of Biological Engineering and Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Eng Eong Ooi
- Interdisciplinary Research Group in Infectious Diseases, Singapore-MIT Alliance for Research & Technology (SMART), Singapore
- DUKE-NUS Graduate Medical School, Singapore
| | - Steven R. Tannenbaum
- Interdisciplinary Research Group in Infectious Diseases, Singapore-MIT Alliance for Research & Technology (SMART), Singapore
- Departments of Biological Engineering and Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
- * E-mail: (SRT); (CNO)
| | - Choon Nam Ong
- Saw Swee Hock School of Public Health, National University of Singapore, Singapore
- NUS Environment Research Institute, Singapore
- * E-mail: (SRT); (CNO)
| |
Collapse
|
45
|
Goh WWB, Sergot MJ, Sng JCG, Sng JC, Wong L. Comparative network-based recovery analysis and proteomic profiling of neurological changes in valproic acid-treated mice. J Proteome Res 2013; 12:2116-27. [PMID: 23557376 PMCID: PMC3805323 DOI: 10.1021/pr301127f] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
![]()
Despite
its prominence for characterization of complex mixtures,
LC–MS/MS frequently fails to identify many proteins. Network-based
analysis methods, based on protein–protein interaction networks
(PPINs), biological pathways, and protein complexes, are useful for
recovering non-detected proteins, thereby enhancing analytical resolution.
However, network-based analysis methods do come in varied flavors
for which the respective efficacies are largely unknown. We compare
the recovery performance and functional insights from three distinct
instances of PPIN-based approaches, viz., Proteomics Expansion Pipeline
(PEP), Functional Class Scoring (FCS), and Maxlink, in a test scenario
of valproic acid (VPA)-treated mice. We find that the most comprehensive
functional insights, as well as best non-detected protein recovery
performance, are derived from FCS utilizing real biological complexes.
This outstrips other network-based methods such as Maxlink or Proteomics
Expansion Pipeline (PEP). From FCS, we identified known biological
complexes involved in epigenetic modifications, neuronal system development,
and cytoskeletal rearrangements. This is congruent with the observed
phenotype where adult mice showed an increase in dendritic branching
to allow the rewiring of visual cortical circuitry and an improvement
in their visual acuity when tested behaviorally. In addition, PEP
also identified a novel complex, comprising YWHAB, NR1, NR2B, ACTB,
and TJP1, which is functionally related to the observed phenotype.
Although our results suggest different network analysis methods can
produce different results, on the whole, the findings are mutually
supportive. More critically, the non-overlapping information each
provides can provide greater holistic understanding of complex phenotypes.
Collapse
|
46
|
Goh WWB, Wong L. Networks in proteomics analysis of cancer. Curr Opin Biotechnol 2013; 24:1122-8. [PMID: 23481377 DOI: 10.1016/j.copbio.2013.02.011] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2012] [Revised: 01/07/2013] [Accepted: 02/09/2013] [Indexed: 01/08/2023]
Abstract
Proteomics provides direct biological information on proteins but is still a limited platform. Borrowing from genomics, its cancer-specific applications can be broadly categorized as (1) pure diagnostics, (2) biomarkers, (3) identification of root causes and (4) identification of cancer-specific network rewirings. Biological networks capture complex relationships between proteins and provide an appropriate means of contextualization. While playing significantly larger roles, especially in 1 and 3, progress in proteomics-specific network-based methods is lagging as compared to genomics. Rapid hardware advances and improvements in proteomic identification and quantification have given rise to much better quality data alongside advent of new network-based analysis methods. However, a tighter integration between analytics and hardware is still essential for network analysis to play more significant roles in proteomics analysis.
Collapse
Affiliation(s)
- Wilson Wen Bin Goh
- Department of Computer Science, National University of Singapore, COM1 Building, 13 Computing Drive, Singapore 117417, Singapore; Department of Computing, Imperial College London, Exhibition Road, London SW7 2AZ, United Kingdom
| | | |
Collapse
|
47
|
Goh WWB, Fan M, Low HS, Sergot M, Wong L. Enhancing the utility of Proteomics Signature Profiling (PSP) with Pathway Derived Subnets (PDSs), performance analysis and specialised ontologies. BMC Genomics 2013; 14:35. [PMID: 23324392 PMCID: PMC3636053 DOI: 10.1186/1471-2164-14-35] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2012] [Accepted: 11/01/2012] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND Proteomics Signature Profiling (PSP) is a novel hit-rate based method that proved useful in resolving consistency and coverage issues in proteomics. As a follow-up study, several points need to be addressed: 1/ PSP's generalisability to pathways, 2/ understanding the biological interplay between significant complexes and pathway subnets co-located on the same pathways on our liver cancer dataset, 3/ understanding PSP's false positive rate and 4/ demonstrating that PSP works on other suitable proteomics datasets as well as expanding PSP's analytical resolution via the use of specialised ontologies. RESULTS 1/ PSP performs well with Pathway-Derived Subnets (PDSs). Comparing the performance of PDSs derived from various pathway databases, we find that an integrative approach is best for optimising analytical resolution. Feature selection also confirms that significant PDSs are closely connected to the cancer phenotype.2/ In liver cancer, correlation studies of significant PSP complexes and PDSs co-localised on the same pathways revealed an interesting relationship between the purine metabolism pathway and two other complexes involved in DNA repair. Our work suggests progression to poor stage requires additional mutations that disrupt DNA repair enzymes.3/ False positive analysis reveals that PSP, applied on both complexes and PDSs, is powerful and precise.4/ Via an expert-curated lipid ontology, we uncovered several interesting lipid-associated complexes that could be associated with cancer progression. Of particular interest is the HMGB1-HMGB2-HSC70-ERP60-GAPDH complex which is also involved in DNA repair. We also demonstrated generalisability of PSP using a non-small-cell lung carcinoma data set. CONCLUSIONS PSP is a powerful and precise technique, capable of identifying biologically coherent features. It works with biological complexes, network-predicted clusters as well as PDSs. Here, an instance of the interplay between significant PDSs and complexes, possibly significantly involved in liver cancer progression but not well understood as yet, is demonstrated. Also demonstrated is the enhancement of PSP's analytical resolution using specialised ontologies.
Collapse
Affiliation(s)
- Wilson Wen Bin Goh
- Department of Computing, Imperial College London, London, United Kingdom
| | | | | | | | | |
Collapse
|
48
|
Wrangling phosphoproteomic data to elucidate cancer signaling pathways. PLoS One 2013; 8:e52884. [PMID: 23300999 PMCID: PMC3536783 DOI: 10.1371/journal.pone.0052884] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2012] [Accepted: 11/22/2012] [Indexed: 12/02/2022] Open
Abstract
The interpretation of biological data sets is essential for generating hypotheses that guide research, yet modern methods of global analysis challenge our ability to discern meaningful patterns and then convey results in a way that can be easily appreciated. Proteomic data is especially challenging because mass spectrometry detectors often miss peptides in complex samples, resulting in sparsely populated data sets. Using the R programming language and techniques from the field of pattern recognition, we have devised methods to resolve and evaluate clusters of proteins related by their pattern of expression in different samples in proteomic data sets. We examined tyrosine phosphoproteomic data from lung cancer samples. We calculated dissimilarities between the proteins based on Pearson or Spearman correlations and on Euclidean distances, whilst dealing with large amounts of missing data. The dissimilarities were then used as feature vectors in clustering and visualization algorithms. The quality of the clusterings and visualizations were evaluated internally based on the primary data and externally based on gene ontology and protein interaction networks. The results show that t-distributed stochastic neighbor embedding (t-SNE) followed by minimum spanning tree methods groups sparse proteomic data into meaningful clusters more effectively than other methods such as k-means and classical multidimensional scaling. Furthermore, our results show that using a combination of Spearman correlation and Euclidean distance as a dissimilarity representation increases the resolution of clusters. Our analyses show that many clusters contain one or more tyrosine kinases and include known effectors as well as proteins with no known interactions. Visualizing these clusters as networks elucidated previously unknown tyrosine kinase signal transduction pathways that drive cancer. Our approach can be applied to other data types, and can be easily adopted because open source software packages are employed.
Collapse
|
49
|
Franceschini A, Szklarczyk D, Frankild S, Kuhn M, Simonovic M, Roth A, Lin J, Minguez P, Bork P, von Mering C, Jensen LJ. STRING v9.1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Res 2013; 41:D808-15. [PMID: 23203871 PMCID: PMC3531103 DOI: 10.1093/nar/gks1094] [Citation(s) in RCA: 3247] [Impact Index Per Article: 295.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2012] [Revised: 10/15/2012] [Accepted: 10/18/2012] [Indexed: 12/12/2022] Open
Abstract
Complete knowledge of all direct and indirect interactions between proteins in a given cell would represent an important milestone towards a comprehensive description of cellular mechanisms and functions. Although this goal is still elusive, considerable progress has been made-particularly for certain model organisms and functional systems. Currently, protein interactions and associations are annotated at various levels of detail in online resources, ranging from raw data repositories to highly formalized pathway databases. For many applications, a global view of all the available interaction data is desirable, including lower-quality data and/or computational predictions. The STRING database (http://string-db.org/) aims to provide such a global perspective for as many organisms as feasible. Known and predicted associations are scored and integrated, resulting in comprehensive protein networks covering >1100 organisms. Here, we describe the update to version 9.1 of STRING, introducing several improvements: (i) we extend the automated mining of scientific texts for interaction information, to now also include full-text articles; (ii) we entirely re-designed the algorithm for transferring interactions from one model organism to the other; and (iii) we provide users with statistical information on any functional enrichment observed in their networks.
Collapse
Affiliation(s)
- Andrea Franceschini
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Denmark, Biotechnology Center, Technical University Dresden, Germany, Department of Computer Science, University of Milan, Italy, European Molecular Biology Laboratory, Heidelberg and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| | - Damian Szklarczyk
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Denmark, Biotechnology Center, Technical University Dresden, Germany, Department of Computer Science, University of Milan, Italy, European Molecular Biology Laboratory, Heidelberg and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| | - Sune Frankild
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Denmark, Biotechnology Center, Technical University Dresden, Germany, Department of Computer Science, University of Milan, Italy, European Molecular Biology Laboratory, Heidelberg and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| | - Michael Kuhn
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Denmark, Biotechnology Center, Technical University Dresden, Germany, Department of Computer Science, University of Milan, Italy, European Molecular Biology Laboratory, Heidelberg and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| | - Milan Simonovic
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Denmark, Biotechnology Center, Technical University Dresden, Germany, Department of Computer Science, University of Milan, Italy, European Molecular Biology Laboratory, Heidelberg and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| | - Alexander Roth
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Denmark, Biotechnology Center, Technical University Dresden, Germany, Department of Computer Science, University of Milan, Italy, European Molecular Biology Laboratory, Heidelberg and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| | - Jianyi Lin
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Denmark, Biotechnology Center, Technical University Dresden, Germany, Department of Computer Science, University of Milan, Italy, European Molecular Biology Laboratory, Heidelberg and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| | - Pablo Minguez
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Denmark, Biotechnology Center, Technical University Dresden, Germany, Department of Computer Science, University of Milan, Italy, European Molecular Biology Laboratory, Heidelberg and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| | - Peer Bork
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Denmark, Biotechnology Center, Technical University Dresden, Germany, Department of Computer Science, University of Milan, Italy, European Molecular Biology Laboratory, Heidelberg and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| | - Christian von Mering
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Denmark, Biotechnology Center, Technical University Dresden, Germany, Department of Computer Science, University of Milan, Italy, European Molecular Biology Laboratory, Heidelberg and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| | - Lars J. Jensen
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Denmark, Biotechnology Center, Technical University Dresden, Germany, Department of Computer Science, University of Milan, Italy, European Molecular Biology Laboratory, Heidelberg and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| |
Collapse
|
50
|
Wright PC, Jaffe S, Noirel J, Zou X. Opportunities for protein interaction network-guided cellular engineering. IUBMB Life 2012; 65:17-27. [DOI: 10.1002/iub.1114] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2012] [Revised: 10/14/2012] [Accepted: 10/15/2012] [Indexed: 01/23/2023]
|