1
|
Wu W, Huang Z, Kong W, Peng H, Goh WWB. Optimizing the PROTREC network-based missing protein prediction algorithm. Proteomics 2024; 24:e2200332. [PMID: 37876146 DOI: 10.1002/pmic.202200332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2022] [Revised: 09/30/2023] [Accepted: 10/06/2023] [Indexed: 10/26/2023]
Abstract
This article summarizes the PROTREC method and investigates the impact that the different hyper-parameters have on the task of missing protein prediction using PROTREC. We evaluate missing protein recovery rates using different PROTREC score selection approaches (MAX, MIN, MEDIAN, and MEAN), different PROTREC score thresholds, as well as different complex size thresholds. In addition, we included two additional cancer datasets in our analysis and introduced a new validation method to check both the robustness of the PROTREC method as well as the correctness of our analysis. Our analysis showed that the missing protein recovery rate can be improved by adopting PROTREC score selection operations of MIN, MEDIAN, and MEAN instead of the default MAX. However, this may come at a cost of reduced numbers of proteins predicted and validated. The users should therefore choose their hyper-parameters carefully to find a balance in the accuracy-quantity trade-off. We also explored the possibility of combining PROTREC with a p-value-based method (FCS) and demonstrated that PROTREC is able to perform well independently without any help from a p-value-based method. Furthermore, we conducted a downstream enrichment analysis to understand the biological pathways and protein networks within the cancerous tissues using the recovered proteins. Missing protein recovery rate using PROTREC can be improved by selecting a different PROTREC score selection method. Different PROTREC score selection methods and other hyper-parameters such as PROTREC score threshold and complex size threshold introduce accuracy-quantity trade-off. PROTREC is able to perform well independently of any filtering using a p-value-based method. Verification of the PROTREC method on additional cancer datasets. Downstream Enrichment Analysis to understand the biological pathways and protein networks in cancerous tissues.
Collapse
Affiliation(s)
- Wenshan Wu
- School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore
| | - Zelu Huang
- School of Chemistry, Chemical Engineering and Biotechnology, Nanyang Technological University, Singapore, Singapore
| | - Weijia Kong
- Department of Computer Science, National University of Singapore, Singapore, Singapore
- School of Biological Science, Nanyang Technological University, Singapore, Singapore
| | - Hui Peng
- School of Biological Science, Nanyang Technological University, Singapore, Singapore
| | - Wilson Wen Bin Goh
- School of Biological Science, Nanyang Technological University, Singapore, Singapore
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore
- Center for Biomedical Informatics, Nanyang Technological University, Singapore, Singapore
| |
Collapse
|
2
|
Resolving missing protein problems using functional class scoring. Sci Rep 2022; 12:11358. [PMID: 35790756 PMCID: PMC9256666 DOI: 10.1038/s41598-022-15314-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2021] [Accepted: 06/22/2022] [Indexed: 11/29/2022] Open
Abstract
Despite technological advances in proteomics, incomplete coverage and inconsistency issues persist, resulting in “data holes”. These data holes cause the missing protein problem (MPP), where relevant proteins are persistently unobserved, or sporadically observed across samples, hindering biomarker discovery and proper functional characterization. Network-based approaches can provide powerful solutions for resolving these issues. Functional Class Scoring (FCS) is one such method that uses protein complex information to recover missing proteins with weak support. However, FCS has not been evaluated on more recent proteomic technologies with higher coverage, and there is no clear way to evaluate its performance. To address these issues, we devised a more rigorous evaluation schema based on cross-verification between technical replicates and evaluated its performance on data acquired under recent Data-Independent Acquisition (DIA) technologies (viz. SWATH). Although cross-replicate examination reveals some inconsistencies amongst same-class samples, tissue-differentiating signal is nonetheless strongly conserved, confirming that FCS selects for biologically meaningful networks. We also report that predicted missing proteins are statistically significant based on FCS p values. Despite limited cross-replicate verification rates, the predicted missing proteins as a whole have higher peptide support than non-predicted proteins. FCS also predicts missing proteins that are often lost due to weak specific peptide support.
Collapse
|
3
|
Kong W, Wong BJH, Gao H, Guo T, Liu X, Du X, Wong L, Goh WWB. PROTREC: A probability-based approach for recovering missing proteins based on biological networks. J Proteomics 2022; 250:104392. [PMID: 34626823 DOI: 10.1016/j.jprot.2021.104392] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Revised: 08/30/2021] [Accepted: 09/02/2021] [Indexed: 12/18/2022]
Abstract
A novel network-based approach for predicting missing proteins (MPs) is proposed here. This approach, PROTREC (short for PROtein RECovery), dominates existing network-based methods - such as Functional Class Scoring (FCS), Hypergeometric Enrichment (HE), and Gene Set Enrichment Analysis (GSEA) - across a variety of proteomics datasets derived from different proteomics data acquisition paradigms: Higher PROTREC scores are much more closely correlated with higher recovery rates of MPs across sample replicates. The PROTREC score, unlike methods reporting p-values, can be directly interpreted as the probability that an unreported protein in a proteomic screen is actually present in the sample being screened. SIGNIFICANCE: Mass spectrometry (MS) has developed rapidly in recent years; however, an obvious proportion of proteins is still undetected, leading to missing protein problems. A few existing protein recovery methods are based on biological networks, but the performance is not satisfactory. We propose a new protein recovery method, PROTREC, a Bayesian-inspired approach based on biological networks, which shows exceptional performance across multiple validation strategies. It does not rely on peptide information, so it avoids the ambiguity issue that most protein assembly methods face.
Collapse
Affiliation(s)
- Weijia Kong
- School of Biological Sciences, Nanyang Technological University, Singapore; Department of Computer Science, National University of Singapore, Singapore
| | | | - Huanhuan Gao
- Zhejiang Provincial Laboratory of Life Sciences and Biomedicine, Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, Zhejiang, China; Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, Zhejiang Province, China
| | - Tiannan Guo
- Zhejiang Provincial Laboratory of Life Sciences and Biomedicine, Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, Zhejiang, China; Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, Zhejiang Province, China
| | - Xianming Liu
- Bruker (Beijing) Scientific Technology Co., Ltd, Shanghai, China
| | - Xiaoxian Du
- Bruker (Beijing) Scientific Technology Co., Ltd, Shanghai, China
| | - Limsoon Wong
- Department of Computer Science, National University of Singapore, Singapore.
| | - Wilson Wen Bin Goh
- School of Biological Sciences, Nanyang Technological University, Singapore; Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore.
| |
Collapse
|
4
|
Differential Expression of Multiple Disease-Related Protein Groups Induced by Valproic Acid in Human SH-SY5Y Neuroblastoma Cells. Brain Sci 2020; 10:brainsci10080545. [PMID: 32806546 PMCID: PMC7465595 DOI: 10.3390/brainsci10080545] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2020] [Revised: 08/07/2020] [Accepted: 08/08/2020] [Indexed: 12/23/2022] Open
Abstract
Valproic acid (VPA) is a multifunctional medication used for the treatment of epilepsy, mania associated with bipolar disorder, and migraine. The pharmacological effects of VPA involve a variety of neurotransmitter and cell signaling systems, but the molecular mechanisms underlying its clinical efficacy is to date largely unknown. In this study, we used the isobaric tags for relative and absolute quantitation shotgun proteomic analysis to screen differentially expressed proteins in VPA-treated SH-SY5Y cells. We identified changes in the expression levels of multiple proteins involved in Alzheimer’s disease, Parkinson’s disease, chromatin remodeling, controlling gene expression via the vitamin D receptor, ribosome biogenesis, ubiquitin-mediated proteolysis, and the mitochondrial oxidative phosphorylation and electron transport chain. Our data indicate that VPA may modulate the differential expression of proteins involved in mitochondrial function and vitamin D receptor-mediated chromatin transcriptional regulation and proteins implicated in the pathogenesis of neurodegenerative diseases.
Collapse
|
5
|
Goh WWB, Wong L. Advanced bioinformatics methods for practical applications in proteomics. Brief Bioinform 2019; 20:347-355. [PMID: 30657890 DOI: 10.1093/bib/bbx128] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2017] [Indexed: 12/22/2022] Open
Abstract
Mass spectrometry (MS)-based proteomics has undergone rapid advancements in recent years, creating challenging problems for bioinformatics. We focus on four aspects where bioinformatics plays a crucial role (and proteomics is needed for clinical application): peptide-spectra matching (PSM) based on the new data-independent acquisition (DIA) paradigm, resolving missing proteins (MPs), dealing with biological and technical heterogeneity in data and statistical feature selection (SFS). DIA is a brute-force strategy that provides greater width and depth but, because it indiscriminately captures spectra such that signal from multiple peptides is mixed, getting good PSMs is difficult. We consider two strategies: simplification of DIA spectra to pseudo-data-dependent acquisition spectra or, alternatively, brute-force search of each DIA spectra against known reference libraries. The MP problem arises when proteins are never (or inconsistently) detected by MS. When observed in at least one sample, imputation methods can be used to guess the approximate protein expression level. If never observed at all, network/protein complex-based contextualization provides an independent prediction platform. Data heterogeneity is a difficult problem with two dimensions: technical (batch effects), which should be removed, and biological (including demography and disease subpopulations), which should be retained. Simple normalization is seldom sufficient, while batch effect-correction algorithms may create errors. Batch effect-resistant normalization methods are a viable alternative. Finally, SFS is vital for practical applications. While many methods exist, there is no best method, and both upstream (e.g. normalization) and downstream processing (e.g. multiple-testing correction) are performance confounders. We also discuss signal detection when class effects are weak.
Collapse
|
6
|
Proteomic investigation of intra-tumor heterogeneity using network-based contextualization - A case study on prostate cancer. J Proteomics 2019; 206:103446. [PMID: 31323421 DOI: 10.1016/j.jprot.2019.103446] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2019] [Revised: 06/12/2019] [Accepted: 07/08/2019] [Indexed: 12/26/2022]
Abstract
Cancer is a heterogeneous disease, confounding the identification of relevant markers and drug targets. Network-based analysis is robust against noise, potentially offering a promising approach towards biomarker identification. We describe here the application of two network-based methods, qPSP (Quantitative Proteomics Signature Profiling) and PFSNet (Paired Fuzzy SubNetworks), in an intra-tissue proteome data set of prostate tissue samples. Despite high basal variation, we find that traditional statistical analysis may exaggerate the extent of heterogeneity. We also report that network-based analysis outperforms protein-based feature selection with concomitantly higher cross-validation accuracy. Overall, network-based analysis provides emergent signal that boosts sensitivity while retaining good precision. It is a potential means of circumventing heterogeneity for stable biomarker discovery.
Collapse
|
7
|
Zhao Y, Sue ACH, Goh WWB. Deeper investigation into the utility of functional class scoring in missing protein prediction from proteomics data. J Bioinform Comput Biol 2019; 17:1950013. [DOI: 10.1142/s0219720019500136] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Functional Class Scoring (FCS) is a network-based approach previously demonstrated to be powerful in missing protein prediction (MPP). We update its performance evaluation using data derived from new proteomics technology (SWATH) and also checked for reproducibility using two independent datasets profiling kidney tissue proteome. We also evaluated the objectivity of the FCS p-value, and followed up on the value of MPP from predicted complexes. Our results suggest that (1) FCS [Formula: see text]-values are non-objective, and are confounded strongly by complex size, (2) best recovery performance do not necessarily lie at standard [Formula: see text]-value cutoffs, (3) while predicted complexes may be used for augmenting MPP, they are inferior to real complexes, and are further confounded by issues relating to network coverage and quality and (4) moderate sized complexes of size 5 to 10 still exhibit considerable instability, we find that FCS works best with big complexes. While FCS is a powerful approach, blind reliance on its non-objective [Formula: see text]-value is ill-advised.
Collapse
Affiliation(s)
- Yaxing Zhao
- School of Pharmaceutical Science and Technology, Tianjin University, No. 92, Weijin Road, 30072 Tianjin, P. R. China
| | - Andrew Chi-Hau Sue
- School of Pharmaceutical Science and Technology, Tianjin University, No. 92, Weijin Road, 30072 Tianjin, P. R. China
| | - Wilson Wen Bin Goh
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, 637551, Singapore
| |
Collapse
|
8
|
Zhou L, Wong L, Goh WWB. Understanding missing proteins: a functional perspective. Drug Discov Today 2018; 23:644-651. [DOI: 10.1016/j.drudis.2017.11.011] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2017] [Revised: 10/24/2017] [Accepted: 11/13/2017] [Indexed: 01/03/2023]
|
9
|
Abstract
Protein complex-based feature selection (PCBFS) provides unparalleled reproducibility with high phenotypic relevance on proteomics data. Currently, there are five PCBFS paradigms, but not all representative methods have been implemented or made readily available. To allow general users to take advantage of these methods, we developed the R-package NetProt, which provides implementations of representative feature-selection methods. NetProt also provides methods for generating simulated differential data and generating pseudocomplexes for complex-based performance benchmarking. The NetProt open source R package is available for download from https://github.com/gohwils/NetProt/releases/ , and online documentation is available at http://rpubs.com/gohwils/204259 .
Collapse
Affiliation(s)
- Wilson Wen Bin Goh
- School of Pharmaceutical Science and Technology, Tianjin University , 92 Weijin Road, Tianjin 300072, China.,School of Biological Sciences, Nanyang Technological University , 60 Nanyang Drive, Singapore 637551.,Department of Computer Science, National University of Singapore , 13 Computing Drive, Singapore 117417
| | - Limsoon Wong
- Department of Computer Science, National University of Singapore , 13 Computing Drive, Singapore 117417.,Department of Pathology, National University of Singapore , 5 Lower Kent Ridge Road, Singapore 119074
| |
Collapse
|
10
|
Goh WWB, Wong L. Class-paired Fuzzy SubNETs: A paired variant of the rank-based network analysis family for feature selection based on protein complexes. Proteomics 2017; 17:e1700093. [PMID: 28390171 DOI: 10.1002/pmic.201700093] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2017] [Accepted: 04/05/2017] [Indexed: 01/12/2023]
Abstract
Identifying reproducible yet relevant protein features in proteomics data is a major challenge. Analysis at the level of protein complexes can resolve this issue and we have developed a suite of feature-selection methods collectively referred to as Rank-Based Network Analysis (RBNA). RBNAs differ in their individual statistical test setup but are similar in the sense that they deploy rank-defined weights among proteins per sample. This procedure is known as gene fuzzy scoring. Currently, no RBNA exists for paired-sample scenarios where both control and test tissues originate from the same source (e.g. same patient). It is expected that paired tests, when used appropriately, are more powerful than approaches intended for unpaired samples. We report that the class-paired RBNA, PPFSNET, dominates in both simulated and real data scenarios. Moreover, for the first time, we explicitly incorporate batch-effect resistance as an additional evaluation criterion for feature-selection approaches. Batch effects are class irrelevant variations arising from different handlers or processing times, and can obfuscate analysis. We demonstrate that PPFSNET and an earlier RBNA, PFSNET, are particularly resistant against batch effects, and only select features strongly correlated with class but not batch.
Collapse
Affiliation(s)
- Wilson Wen Bin Goh
- School of Pharmaceutical Science and Technology, Tianjin University, P. R. China.,Department of Computer Science, National University of Singapore, Singapore
| | - Limsoon Wong
- Department of Computer Science, National University of Singapore, Singapore.,Department of Pathology, National University of Singapore, Singapore
| |
Collapse
|
11
|
Goh WWB, Wong L. Protein complex-based analysis is resistant to the obfuscating consequences of batch effects --- a case study in clinical proteomics. BMC Genomics 2017; 18:142. [PMID: 28361693 PMCID: PMC5374662 DOI: 10.1186/s12864-017-3490-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Background In proteomics, batch effects are technical sources of variation that confounds proper analysis, preventing effective deployment in clinical and translational research. Results Using simulated and real data, we demonstrate existing batch effect-correction methods do not always eradicate all batch effects. Worse still, they may alter data integrity, and introduce false positives. Moreover, although Principal component analysis (PCA) is commonly used for detecting batch effects. The principal components (PCs) themselves may be used as differential features, from which relevant differential proteins may be effectively traced. Batch effect are removable by identifying PCs highly correlated with batch but not class effect. However, neither PC-based nor existing batch effect-correction methods address well subtle batch effects, which are difficult to eradicate, and involve data transformation and/or projection which is error-prone. To address this, we introduce the concept of batch-effect resistant methods and demonstrate how such methods incorporating protein complexes are particularly resistant to batch effect without compromising data integrity. Conclusions Protein complex-based analyses are powerful, offering unparalleled differential protein-selection reproducibility and high prediction accuracy. We demonstrate for the first time their innate resistance against batch effects, even subtle ones. As complex-based analyses require no prior data transformation (e.g. batch-effect correction), data integrity is protected. Individual checks on top-ranked protein complexes confirm strong association with phenotype classes and not batch. Therefore, the constituent proteins of these complexes are more likely to be clinically relevant. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3490-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Wilson Wen Bin Goh
- School of Pharmaceutical Science and Technology, Tianjin University, 92 Weijin Road, Nankai District, Tianjin, 300072, People's Republic of China. .,Department of Computer Science, National University of Singapore, 13 Computing Drive, Singapore, 117417, Singapore.
| | - Limsoon Wong
- Department of Computer Science, National University of Singapore, 13 Computing Drive, Singapore, 117417, Singapore. .,Department of Pathology, National University of Singapore, Singapore, Singapore.
| |
Collapse
|
12
|
Lee PKM, Goh WWB, Sng JCG. Network-based characterization of the synaptic proteome reveals that removal of epigenetic regulator Prmt8 restricts proteins associated with synaptic maturation. J Neurochem 2017; 140:613-628. [PMID: 27935040 DOI: 10.1111/jnc.13921] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2016] [Revised: 11/30/2016] [Accepted: 12/04/2016] [Indexed: 12/13/2022]
Abstract
The brain adapts to dynamic environmental conditions by altering its epigenetic state, thereby influencing neuronal transcriptional programs. An example of an epigenetic modification is protein methylation, catalyzed by protein arginine methyltransferases (PRMT). One member, Prmt8, is selectively expressed in the central nervous system during a crucial phase of early development, but little else is known regarding its function. We hypothesize Prmt8 plays a role in synaptic maturation during development. To evaluate this, we used a proteome-wide approach to characterize the synaptic proteome of Prmt8 knockout versus wild-type mice. Through comparative network-based analyses, proteins and functional clusters related to neurite development were identified to be differentially regulated between the two genotypes. One interesting protein that was differentially regulated was tenascin-R (TNR). Chromatin immunoprecipitation demonstrated binding of PRMT8 to the tenascin-r (Tnr) promoter. TNR, a component of perineuronal nets, preserves structural integrity of synaptic connections within neuronal networks during the development of visual-somatosensory cortices. On closer inspection, Prmt8 removal increased net formation and decreased inhibitory parvalbumin-positive (PV+) puncta on pyramidal neurons, thereby hindering the maturation of circuits. Consequently, visual acuity of the knockout mice was reduced. Our results demonstrated Prmt8's involvement in synaptic maturation and its prospect as an epigenetic modulator of developmental neuroplasticity by regulating structural elements such as the perineuronal nets.
Collapse
Affiliation(s)
- Patrick Kia Ming Lee
- Integrative Neuroscience Program, Singapore Institute for Clinical Sciences, Agency for Science Technology and Research (A*STAR), Singapore.,Department of Pharmacology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore.,School of Pharmaceutical Science and Technology, Tianjin University, Tianjin, China
| | - Wilson Wen Bin Goh
- School of Pharmaceutical Science and Technology, Tianjin University, Tianjin, China.,Department of Computer Science, National University of Singapore, Singapore
| | - Judy Chia Ghee Sng
- Department of Pharmacology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| |
Collapse
|
13
|
Goh WWB. Fuzzy-FishNET: a highly reproducible protein complex-based approach for feature selection in comparative proteomics. BMC Med Genomics 2016; 9:67. [PMID: 28117654 PMCID: PMC5260792 DOI: 10.1186/s12920-016-0228-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
Background The hypergeometric enrichment analysis approach typically fares poorly in feature-selection stability due to its upstream reliance on the t-test to generate differential protein lists before testing for enrichment on a protein complex, subnetwork or gene group. Methods Swapping the t-test in favour of a fuzzy rank-based weight system similar to that used in network-based methods like Quantitative Proteomics Signature Profiling (QPSP), Fuzzy SubNets (FSNET) and paired FSNET (PFSNET) produces dramatic improvements. Results This approach, Fuzzy-FishNET, exhibits high precision-recall over three sets of simulated data (with simulated protein complexes) while excelling in feature-selection reproducibility on real data (based on evaluation with real protein complexes). Overlap comparisons with PFSNET shows Fuzzy-FishNET selects the most significant complexes, which are also strongly class-discriminative. Cross-validation further demonstrates Fuzzy-FishNET selects class-relevant protein complexes. Conclusions Based on evaluation with simulated and real datasets, Fuzzy-FishNET is a significant upgrade of the traditional hypergeometric enrichment approach and a powerful new entrant amongst comparative proteomics analysis methods. Electronic supplementary material The online version of this article (doi:10.1186/s12920-016-0228-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Wilson Wen Bin Goh
- School of Pharmaceutical Science and Technology, Tianjin University, Tianjin, People's Republic of China.
| |
Collapse
|
14
|
Goh WWB, Wong L. Integrating Networks and Proteomics: Moving Forward. Trends Biotechnol 2016; 34:951-959. [DOI: 10.1016/j.tibtech.2016.05.015] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2016] [Revised: 05/23/2016] [Accepted: 05/24/2016] [Indexed: 11/28/2022]
|
15
|
Goh WWB, Wong L. Advancing Clinical Proteomics via Analysis Based on Biological Complexes: A Tale of Five Paradigms. J Proteome Res 2016; 15:3167-79. [DOI: 10.1021/acs.jproteome.6b00402] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Affiliation(s)
- Wilson Wen Bin Goh
- School
of Pharmaceutical Science and Technology, Tianjin University, 92 Weijin Road, Nankai District, Tianjin 300072, China
- Department
of Computer Science, National University of Singapore, 13 Computing
Drive, Singapore 117417
| | - Limsoon Wong
- Department
of Computer Science, National University of Singapore, 13 Computing
Drive, Singapore 117417
- Department
of Pathology, National University of Singapore, 5 Lower Kent Ridge Road, Singapore 117417
| |
Collapse
|
16
|
Goh WWB, Wong L. Evaluating feature-selection stability in next-generation proteomics. J Bioinform Comput Biol 2016; 14:1650029. [PMID: 27640811 DOI: 10.1142/s0219720016500293] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Identifying reproducible yet relevant features is a major challenge in biological research. This is well documented in genomics data. Using a proposed set of three reliability benchmarks, we find that this issue exists also in proteomics for commonly used feature-selection methods, e.g. [Formula: see text]-test and recursive feature elimination. Moreover, due to high test variability, selecting the top proteins based on [Formula: see text]-value ranks - even when restricted to high-abundance proteins - does not improve reproducibility. Statistical testing based on networks are believed to be more robust, but this does not always hold true: The commonly used hypergeometric enrichment that tests for enrichment of protein subnets performs abysmally due to its dependence on unstable protein pre-selection steps. We demonstrate here for the first time the utility of a novel suite of network-based algorithms called ranked-based network algorithms (RBNAs) on proteomics. These have originally been introduced and tested extensively on genomics data. We show here that they are highly stable, reproducible and select relevant features when applied to proteomics data. It is also evident from these results that use of statistical feature testing on protein expression data should be executed with due caution. Careless use of networks does not resolve poor-performance issues, and can even mislead. We recommend augmenting statistical feature-selection methods with concurrent analysis on stability and reproducibility to improve the quality of the selected features prior to experimental validation.
Collapse
Affiliation(s)
- Wilson Wen Bin Goh
- 1 School of Pharmaceutical Science and Technology, Tianjin University, 92 Weijin Road, Tianjin 300072, China.,2 Department of Computer Science, National University of Singapore, 13 Computing Drive, Singapore 117417 Singapore
| | - Limsoon Wong
- 1 School of Pharmaceutical Science and Technology, Tianjin University, 92 Weijin Road, Tianjin 300072, China.,2 Department of Computer Science, National University of Singapore, 13 Computing Drive, Singapore 117417 Singapore
| |
Collapse
|
17
|
Design principles for clinical network-based proteomics. Drug Discov Today 2016; 21:1130-8. [DOI: 10.1016/j.drudis.2016.05.013] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2015] [Revised: 04/18/2016] [Accepted: 05/20/2016] [Indexed: 01/10/2023]
|
18
|
Gao SG, Liu RM, Zhao YG, Wang P, Ward DG, Wang GC, Guo XQ, Gu J, Niu WB, Zhang T, Martin A, Guo ZP, Feng XS, Qi YJ, Ma YF. Integrative topological analysis of mass spectrometry data reveals molecular features with clinical relevance in esophageal squamous cell carcinoma. Sci Rep 2016; 6:21586. [PMID: 26898710 PMCID: PMC4761933 DOI: 10.1038/srep21586] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2015] [Accepted: 01/26/2016] [Indexed: 02/06/2023] Open
Abstract
Combining MS-based proteomic data with network and topological features of such network would identify more clinically relevant molecules and meaningfully expand the repertoire of proteins derived from MS analysis. The integrative topological indexes representing 95.96% information of seven individual topological measures of node proteins were calculated within a protein-protein interaction (PPI) network, built using 244 differentially expressed proteins (DEPs) identified by iTRAQ 2D-LC-MS/MS. Compared with DEPs, differentially expressed genes (DEGs) and comprehensive features (CFs), structurally dominant nodes (SDNs) based on integrative topological index distribution produced comparable classification performance in three different clinical settings using five independent gene expression data sets. The signature molecules of SDN-based classifier for distinction of early from late clinical TNM stages were enriched in biological traits of protein synthesis, intracellular localization and ribosome biogenesis, which suggests that ribosome biogenesis represents a promising therapeutic target for treating ESCC. In addition, ITGB1 expression selected exclusively by integrative topological measures correlated with clinical stages and prognosis, which was further validated with two independent cohorts of ESCC samples. Thus the integrative topological analysis of PPI networks proposed in this study provides an alternative approach to identify potential biomarkers and therapeutic targets from MS/MS data with functional insights in ESCC.
Collapse
Affiliation(s)
- She-Gan Gao
- Henan Key Laboratory of Cancer Epigenetics, Cancer Institute, The First Affiliated Hospital, College of Clinical Medicine, Henan University of Science and Technology, Luoyang, P. R. China, 471003
| | - Rui-Min Liu
- Henan Key Laboratory of Engineering Antibody Medicine, Henan International United Laboratory of Antibody Medicine, Key Laboratory of Cellular and Molecular Immunology, Henan University School of Medicine, Kaifeng 475004, P.R. China
| | - Yun-Gang Zhao
- Henan Key Laboratory of Engineering Antibody Medicine, Henan International United Laboratory of Antibody Medicine, Key Laboratory of Cellular and Molecular Immunology, Henan University School of Medicine, Kaifeng 475004, P.R. China
| | - Pei Wang
- School of Mathematics and Statistics, Henan University, Kaifeng, China, Henan 475004, P. R. China
| | - Douglas G. Ward
- School of Cancer Sciences, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
| | - Guang-Chao Wang
- Henan Key Laboratory of Engineering Antibody Medicine, Henan International United Laboratory of Antibody Medicine, Key Laboratory of Cellular and Molecular Immunology, Henan University School of Medicine, Kaifeng 475004, P.R. China
| | - Xiang-Qian Guo
- Henan Key Laboratory of Engineering Antibody Medicine, Henan International United Laboratory of Antibody Medicine, Key Laboratory of Cellular and Molecular Immunology, Henan University School of Medicine, Kaifeng 475004, P.R. China
| | - Juan Gu
- Henan Key Laboratory of Engineering Antibody Medicine, Henan International United Laboratory of Antibody Medicine, Key Laboratory of Cellular and Molecular Immunology, Henan University School of Medicine, Kaifeng 475004, P.R. China
| | - Wan-Bin Niu
- Henan Key Laboratory of Engineering Antibody Medicine, Henan International United Laboratory of Antibody Medicine, Key Laboratory of Cellular and Molecular Immunology, Henan University School of Medicine, Kaifeng 475004, P.R. China
| | - Tian Zhang
- Henan Key Laboratory of Engineering Antibody Medicine, Henan International United Laboratory of Antibody Medicine, Key Laboratory of Cellular and Molecular Immunology, Henan University School of Medicine, Kaifeng 475004, P.R. China
| | - Ashley Martin
- School of Cancer Sciences, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
| | - Zhi-Peng Guo
- Henan Key Laboratory of Engineering Antibody Medicine, Henan International United Laboratory of Antibody Medicine, Key Laboratory of Cellular and Molecular Immunology, Henan University School of Medicine, Kaifeng 475004, P.R. China
| | - Xiao-Shan Feng
- Henan Key Laboratory of Cancer Epigenetics, Cancer Institute, The First Affiliated Hospital, College of Clinical Medicine, Henan University of Science and Technology, Luoyang, P. R. China, 471003
| | - Yi-Jun Qi
- Henan Key Laboratory of Engineering Antibody Medicine, Henan International United Laboratory of Antibody Medicine, Key Laboratory of Cellular and Molecular Immunology, Henan University School of Medicine, Kaifeng 475004, P.R. China
| | - Yuan-Fang Ma
- Henan Key Laboratory of Engineering Antibody Medicine, Henan International United Laboratory of Antibody Medicine, Key Laboratory of Cellular and Molecular Immunology, Henan University School of Medicine, Kaifeng 475004, P.R. China
| |
Collapse
|
19
|
Bin Goh WW, Guo T, Aebersold R, Wong L. Quantitative proteomics signature profiling based on network contextualization. Biol Direct 2015; 10:71. [PMID: 26666224 PMCID: PMC4678536 DOI: 10.1186/s13062-015-0098-x] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2015] [Accepted: 11/30/2015] [Indexed: 12/02/2022] Open
Abstract
Background We present a network-based method, namely quantitative proteomic signature profiling (qPSP) that improves the biological content of proteomic data by converting protein expressions into hit-rates in protein complexes. Results We demonstrate, using two clinical proteomics datasets, that qPSP produces robust discrimination between phenotype classes (e.g. normal vs. disease) and uncovers phenotype-relevant protein complexes. Regardless of acquisition paradigm, comparisons of qPSP against conventional methods (e.g. t-test or hypergeometric test) demonstrate that it produces more stable and consistent predictions, even at small sample size. We show that qPSP is theoretically robust to noise, and that this robustness to noise is also observable in practice. Comparative analysis of hit-rates and protein expressions in significant complexes reveals that hit-rates are a useful means of summarizing differential behavior in a complex-specific manner. Conclusions Given qPSP’s ability to discriminate phenotype classes even at small sample sizes, high robustness to noise, and better summary statistics, it can be deployed towards analysis of highly heterogeneous clinical proteomics data. Reviewers This article was reviewed by Frank Eisenhaber and Sebastian Maurer-Stroh. Open peer review Reviewed by Frank Eisenhaber and Sebastian Maurer-Stroh. Electronic supplementary material The online version of this article (doi:10.1186/s13062-015-0098-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Wilson Wen Bin Goh
- School of Pharmaceutical Science and Technology, Tianjin University, 92 Weijin Road, Tianjin City, 300072, China. .,Center for Interdisciplinary Cardiovascular Sciences, Harvard Medical School, Boston, USA. .,Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland. .,School of Computing, National University of Singapore, Singapore, Singapore.
| | - Tiannan Guo
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland.
| | - Ruedi Aebersold
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland. .,Faculty of Science, University of Zurich, Zurich, Switzerland.
| | - Limsoon Wong
- School of Computing, National University of Singapore, Singapore, Singapore.
| |
Collapse
|
20
|
Oikawa H, Goh WWB, Lim VKJ, Wong L, Sng JCG. Valproic acid mediates miR-124 to down-regulate a novel protein target, GNAI1. Neurochem Int 2015; 91:62-71. [PMID: 26519098 DOI: 10.1016/j.neuint.2015.10.010] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2015] [Revised: 10/17/2015] [Accepted: 10/23/2015] [Indexed: 01/07/2023]
Abstract
Valproic acid (VPA) is an anti-convulsant drug that is recently shown to have neuroregenerative therapeutic actions. In this study, we investigate the underlying molecular mechanism of VPA and its effects on Bdnf transcription through microRNAs (miRNAs) and their corresponding target proteins. Using in silico algorithms, we predicted from our miRNA microarray and iTRAQ data that miR-124 is likely to target at guanine nucleotide binding protein alpha inhibitor 1 (GNAI1), an adenylate cyclase inhibitor. With the reduction of GNAI1 mediated by VPA, the cAMP is enhanced to increase Bdnf expression. The levels of GNAI1 protein and Bdnf mRNA can be manipulated with either miR-124 mimic or inhibitor. In summary, we have identified a novel molecular mechanism of VPA that induces miR-124 to repress GNAI1. The implication of miR-124→GNAI1→BDNF pathway with valproic acid treatment suggests that we could repurpose an old drug, valproic acid, as a clinical application to elevate neurotrophin levels in treating neurodegenerative diseases.
Collapse
Affiliation(s)
- Hirotaka Oikawa
- Neuroepigenetics Laboratory, Singapore Institute for Clinical Sciences, Agency for Science and Technology (A*STAR), Singapore
| | - Wilson W B Goh
- School of Pharmaceutical Science and Technology, Tianjin University, China; School of Computing, National University of Singapore, Singapore
| | - Vania K J Lim
- Neuroepigenetics Laboratory, Singapore Institute for Clinical Sciences, Agency for Science and Technology (A*STAR), Singapore
| | - Limsoon Wong
- School of Computing, National University of Singapore, Singapore
| | - Judy C G Sng
- Neuroepigenetics Laboratory, Singapore Institute for Clinical Sciences, Agency for Science and Technology (A*STAR), Singapore; Department of Pharmacology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore.
| |
Collapse
|
21
|
Webb-Robertson BJM, Wiberg HK, Matzke MM, Brown JN, Wang J, McDermott JE, Smith RD, Rodland KD, Metz TO, Pounds JG, Waters KM. Review, evaluation, and discussion of the challenges of missing value imputation for mass spectrometry-based label-free global proteomics. J Proteome Res 2015; 14:1993-2001. [PMID: 25855118 DOI: 10.1021/pr501138h] [Citation(s) in RCA: 167] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
In this review, we apply selected imputation strategies to label-free liquid chromatography-mass spectrometry (LC-MS) proteomics datasets to evaluate the accuracy with respect to metrics of variance and classification. We evaluate several commonly used imputation approaches for individual merits and discuss the caveats of each approach with respect to the example LC-MS proteomics data. In general, local similarity-based approaches, such as the regularized expectation maximization and least-squares adaptive algorithms, yield the best overall performances with respect to metrics of accuracy and robustness. However, no single algorithm consistently outperforms the remaining approaches, and in some cases, performing classification without imputation sometimes yielded the most accurate classification. Thus, because of the complex mechanisms of missing data in proteomics, which also vary from peptide to protein, no individual method is a single solution for imputation. On the basis of the observations in this review, the goal for imputation in the field of computational proteomics should be to develop new approaches that work generically for this data type and new strategies to guide users in the selection of the best imputation for their dataset and analysis objectives.
Collapse
Affiliation(s)
| | - Holli K Wiberg
- Pacific Northwest National Laboratory, PO BOX 999, K7-20, Richland, Washington 99352, United States
| | - Melissa M Matzke
- Pacific Northwest National Laboratory, PO BOX 999, K7-20, Richland, Washington 99352, United States
| | - Joseph N Brown
- Pacific Northwest National Laboratory, PO BOX 999, K7-20, Richland, Washington 99352, United States
| | - Jing Wang
- Pacific Northwest National Laboratory, PO BOX 999, K7-20, Richland, Washington 99352, United States
| | - Jason E McDermott
- Pacific Northwest National Laboratory, PO BOX 999, K7-20, Richland, Washington 99352, United States
| | - Richard D Smith
- Pacific Northwest National Laboratory, PO BOX 999, K7-20, Richland, Washington 99352, United States
| | - Karin D Rodland
- Pacific Northwest National Laboratory, PO BOX 999, K7-20, Richland, Washington 99352, United States
| | - Thomas O Metz
- Pacific Northwest National Laboratory, PO BOX 999, K7-20, Richland, Washington 99352, United States
| | - Joel G Pounds
- Pacific Northwest National Laboratory, PO BOX 999, K7-20, Richland, Washington 99352, United States
| | - Katrina M Waters
- Pacific Northwest National Laboratory, PO BOX 999, K7-20, Richland, Washington 99352, United States
| |
Collapse
|
22
|
Wang X, Zhang B. Integrating genomic, transcriptomic, and interactome data to improve Peptide and protein identification in shotgun proteomics. J Proteome Res 2014; 13:2715-23. [PMID: 24792918 PMCID: PMC4059263 DOI: 10.1021/pr500194t] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
![]()
Mass spectrometry (MS)-based shotgun
proteomics is an effective
technology for global proteome profiling. The ultimate goal is to
assign tandem MS spectra to peptides and subsequently infer proteins
and their abundance. In addition to database searching and protein
assembly algorithms, computational approaches have been developed
to integrate genomic, transcriptomic, and interactome information
to improve peptide and protein identification. Earlier efforts focus
primarily on making databases more comprehensive using publicly available
genomic and transcriptomic data. More recently, with the increasing
affordability of the Next Generation Sequencing (NGS) technologies,
personalized protein databases derived from sample-specific genomic
and transcriptomic data have emerged as an attractive strategy. In
addition, incorporating interactome data not only improves protein
identification but also puts identified proteins into their functional
context and thus facilitates data interpretation. In this paper, we
survey the major integrative bioinformatics approaches that have been
developed during the past decade and discuss their merits and demerits.
Collapse
Affiliation(s)
- Xiaojing Wang
- Department of Biomedical Informatics, ‡Vanderbilt-Ingram Cancer Center, and §Department of Cancer Biology, Vanderbilt University School of Medicine , Nashville, Tennessee 37232, United States
| | | |
Collapse
|
23
|
Goh WWB, Wong L. Computational proteomics: designing a comprehensive analytical strategy. Drug Discov Today 2014; 19:266-74. [DOI: 10.1016/j.drudis.2013.07.008] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2013] [Revised: 06/28/2013] [Accepted: 07/11/2013] [Indexed: 02/02/2023]
|
24
|
Contemporary network proteomics and its requirements. BIOLOGY 2013; 3:22-38. [PMID: 24833333 PMCID: PMC4009760 DOI: 10.3390/biology3010022] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/12/2013] [Revised: 12/15/2013] [Accepted: 12/16/2013] [Indexed: 01/10/2023]
Abstract
The integration of networks with genomics (network genomics) is a familiar field. Conventional network analysis takes advantage of the larger coverage and relative stability of gene expression measurements. Network proteomics on the other hand has to develop further on two critical factors: (1) expanded data coverage and consistency, and (2) suitable reference network libraries, and data mining from them. Concerning (1) we discuss several contemporary themes that can improve data quality, which in turn will boost the outcome of downstream network analysis. For (2), we focus on network analysis developments, specifically, the need for context-specific networks and essential considerations for localized network analysis.
Collapse
|