1
|
Plouviez M, Dubreucq E. Key Proteomics Tools for Fundamental and Applied Microalgal Research. Proteomes 2024; 12:13. [PMID: 38651372 PMCID: PMC11036299 DOI: 10.3390/proteomes12020013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Revised: 03/28/2024] [Accepted: 04/02/2024] [Indexed: 04/25/2024] Open
Abstract
Microscopic, photosynthetic prokaryotes and eukaryotes, collectively referred to as microalgae, are widely studied to improve our understanding of key metabolic pathways (e.g., photosynthesis) and for the development of biotechnological applications. Omics technologies, which are now common tools in biological research, have been shown to be critical in microalgal research. In the past decade, significant technological advancements have allowed omics technologies to become more affordable and efficient, with huge datasets being generated. In particular, where studies focused on a single or few proteins decades ago, it is now possible to study the whole proteome of a microalgae. The development of mass spectrometry-based methods has provided this leap forward with the high-throughput identification and quantification of proteins. This review specifically provides an overview of the use of proteomics in fundamental (e.g., photosynthesis) and applied (e.g., lipid production for biofuel) microalgal research, and presents future research directions in this field.
Collapse
Affiliation(s)
- Maxence Plouviez
- School of Agriculture and Environment, Massey University, Palmerston North 4410, New Zealand
- The Cawthron Institute, Nelson 7010, New Zealand
| | - Eric Dubreucq
- Agropolymer Engineering and Emerging Technologies, L’Institut Agro Montpellier, 34060 Montpellier, France;
| |
Collapse
|
2
|
Kong W, Hui HWH, Peng H, Goh WWB. Dealing with missing values in proteomics data. Proteomics 2022; 22:e2200092. [PMID: 36349819 DOI: 10.1002/pmic.202200092] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 09/15/2022] [Accepted: 10/11/2022] [Indexed: 11/10/2022]
Abstract
Proteomics data are often plagued with missingness issues. These missing values (MVs) threaten the integrity of subsequent statistical analyses by reduction of statistical power, introduction of bias, and failure to represent the true sample. Over the years, several categories of missing value imputation (MVI) methods have been developed and adapted for proteomics data. These MVI methods perform their tasks based on different prior assumptions (e.g., data is normally or independently distributed) and operating principles (e.g., the algorithm is built to address random missingness only), resulting in varying levels of performance even when dealing with the same dataset. Thus, to achieve a satisfactory outcome, a suitable MVI method must be selected. To guide decision making on suitable MVI method, we provide a decision chart which facilitates strategic considerations on datasets presenting different characteristics. We also bring attention to other issues that can impact proper MVI such as the presence of confounders (e.g., batch effects) which can influence MVI performance. Thus, these too, should be considered during or before MVI.
Collapse
Affiliation(s)
- Weijia Kong
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore.,School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Harvard Wai Hann Hui
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore.,School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Hui Peng
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore.,School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Wilson Wen Bin Goh
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore.,School of Biological Sciences, Nanyang Technological University, Singapore, Singapore.,Centre for Biomedical Informatics, Nanyang Technological University, Singapore, Singapore
| |
Collapse
|
3
|
Resolving missing protein problems using functional class scoring. Sci Rep 2022; 12:11358. [PMID: 35790756 PMCID: PMC9256666 DOI: 10.1038/s41598-022-15314-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2021] [Accepted: 06/22/2022] [Indexed: 11/29/2022] Open
Abstract
Despite technological advances in proteomics, incomplete coverage and inconsistency issues persist, resulting in “data holes”. These data holes cause the missing protein problem (MPP), where relevant proteins are persistently unobserved, or sporadically observed across samples, hindering biomarker discovery and proper functional characterization. Network-based approaches can provide powerful solutions for resolving these issues. Functional Class Scoring (FCS) is one such method that uses protein complex information to recover missing proteins with weak support. However, FCS has not been evaluated on more recent proteomic technologies with higher coverage, and there is no clear way to evaluate its performance. To address these issues, we devised a more rigorous evaluation schema based on cross-verification between technical replicates and evaluated its performance on data acquired under recent Data-Independent Acquisition (DIA) technologies (viz. SWATH). Although cross-replicate examination reveals some inconsistencies amongst same-class samples, tissue-differentiating signal is nonetheless strongly conserved, confirming that FCS selects for biologically meaningful networks. We also report that predicted missing proteins are statistically significant based on FCS p values. Despite limited cross-replicate verification rates, the predicted missing proteins as a whole have higher peptide support than non-predicted proteins. FCS also predicts missing proteins that are often lost due to weak specific peptide support.
Collapse
|
4
|
Ruhle M, Espinal-Enríquez J, Hernández-Lemus E. The Breast Cancer Protein Co-Expression Landscape. Cancers (Basel) 2022; 14:cancers14122957. [PMID: 35740621 PMCID: PMC9221059 DOI: 10.3390/cancers14122957] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Revised: 06/03/2022] [Accepted: 06/11/2022] [Indexed: 11/16/2022] Open
Abstract
Simple Summary Proteins are among the most fundamental building blocks and molecular players behind the functions of cells and tissues. Their abundance and interaction patterns shape, to a large extent, what happens at the cellular and organ levels. This is also true regarding tumor tissues. In this work, we explored the patterns of abundance and co-occurrence of a large number of proteins in breast cancer cells and their healthy counterparts. We discovered the main differences and tried to see whether those differences may be associated with relevant aspects of the biology of these tumors. Our final goal is to provide information to empower cancer clinicians and pharmacologists to develop better diagnostic, prognostic, and therapeutic tools. Abstract Breast cancer is a complex phenotype (or better yet, several complex phenotypes) characterized by the interplay of a large number of cellular and biomolecular entities. Biological networks have been successfully used to capture some of the heterogeneity of intricate pathophenotypes, including cancer. Gene coexpression networks, in particular, have been used to study large-scale regulatory patterns. Ultimately, biological processes are carried out by proteins and their complexes. However, to date, most of the tumor profiling research has focused on the genomic and transcriptomic information. Here, we tried to expand this profiling through the analysis of open proteomic data via mutual information co-expression networks’ analysis. We could observe that there are distinctive biological processes associated with communities of these networks and how some transcriptional co-expression phenomena are lost at the protein level. These kinds of data and network analyses are a broad resource to explore cellular behavior and cancer research.
Collapse
Affiliation(s)
- Martín Ruhle
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City 14610, Mexico; (M.R.); (J.E.-E.)
| | - Jesús Espinal-Enríquez
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City 14610, Mexico; (M.R.); (J.E.-E.)
- Center for Complexity Sciences, Universidad Nacional Autonoma de Mexico, Mexico City 04510, Mexico
| | - Enrique Hernández-Lemus
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City 14610, Mexico; (M.R.); (J.E.-E.)
- Center for Complexity Sciences, Universidad Nacional Autonoma de Mexico, Mexico City 04510, Mexico
- Correspondence: ; Tel.: +52-5555-5350-1970
| |
Collapse
|
5
|
Kong W, Wong BJH, Gao H, Guo T, Liu X, Du X, Wong L, Goh WWB. PROTREC: A probability-based approach for recovering missing proteins based on biological networks. J Proteomics 2022; 250:104392. [PMID: 34626823 DOI: 10.1016/j.jprot.2021.104392] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Revised: 08/30/2021] [Accepted: 09/02/2021] [Indexed: 12/18/2022]
Abstract
A novel network-based approach for predicting missing proteins (MPs) is proposed here. This approach, PROTREC (short for PROtein RECovery), dominates existing network-based methods - such as Functional Class Scoring (FCS), Hypergeometric Enrichment (HE), and Gene Set Enrichment Analysis (GSEA) - across a variety of proteomics datasets derived from different proteomics data acquisition paradigms: Higher PROTREC scores are much more closely correlated with higher recovery rates of MPs across sample replicates. The PROTREC score, unlike methods reporting p-values, can be directly interpreted as the probability that an unreported protein in a proteomic screen is actually present in the sample being screened. SIGNIFICANCE: Mass spectrometry (MS) has developed rapidly in recent years; however, an obvious proportion of proteins is still undetected, leading to missing protein problems. A few existing protein recovery methods are based on biological networks, but the performance is not satisfactory. We propose a new protein recovery method, PROTREC, a Bayesian-inspired approach based on biological networks, which shows exceptional performance across multiple validation strategies. It does not rely on peptide information, so it avoids the ambiguity issue that most protein assembly methods face.
Collapse
Affiliation(s)
- Weijia Kong
- School of Biological Sciences, Nanyang Technological University, Singapore; Department of Computer Science, National University of Singapore, Singapore
| | | | - Huanhuan Gao
- Zhejiang Provincial Laboratory of Life Sciences and Biomedicine, Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, Zhejiang, China; Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, Zhejiang Province, China
| | - Tiannan Guo
- Zhejiang Provincial Laboratory of Life Sciences and Biomedicine, Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, Zhejiang, China; Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, Zhejiang Province, China
| | - Xianming Liu
- Bruker (Beijing) Scientific Technology Co., Ltd, Shanghai, China
| | - Xiaoxian Du
- Bruker (Beijing) Scientific Technology Co., Ltd, Shanghai, China
| | - Limsoon Wong
- Department of Computer Science, National University of Singapore, Singapore.
| | - Wilson Wen Bin Goh
- School of Biological Sciences, Nanyang Technological University, Singapore; Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore.
| |
Collapse
|
6
|
Goh WWB, Wong L. The Birth of Bio-data Science: Trends, Expectations, and Applications. GENOMICS, PROTEOMICS & BIOINFORMATICS 2020; 18:5-15. [PMID: 32428604 PMCID: PMC7393550 DOI: 10.1016/j.gpb.2020.01.002] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Revised: 12/02/2019] [Accepted: 02/26/2020] [Indexed: 12/23/2022]
Affiliation(s)
- Wilson Wen Bin Goh
- (1)School of Biological Sciences, Nanyang Technological University, Singapore 637551, Singapore.
| | - Limsoon Wong
- (2)Department of Computer Science, National University of Singapore, Singapore 117417, Singapore.
| |
Collapse
|
7
|
Zeng W, Niu L, Wang Z, Wang X, Wang Y, Pan L, Lu Z, Cui G, Weng W, Wang M, Meng X, Wang Z. Application of an antibody chip for screening differentially expressed proteins during peach ripening and identification of a metabolon in the SAM cycle to generate a peach ethylene biosynthesis model. HORTICULTURE RESEARCH 2020; 7:31. [PMID: 32194967 PMCID: PMC7072073 DOI: 10.1038/s41438-020-0249-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/15/2019] [Revised: 11/27/2019] [Accepted: 01/07/2020] [Indexed: 05/21/2023]
Abstract
Peach (Prunus persica) is a typical climacteric fruit that produces ethylene rapidly during ripening, and its fruit softens quickly. Stony hard peach cultivars, however, do not produce large amounts of ethylene, and the fruit remains firm until fully ripe, thus differing from melting flesh peach cultivars. To identify the key proteins involved in peach fruit ripening, an antibody-based proteomic analysis was conducted. A mega-monoclonal antibody (mAb) library was generated and arrayed on a chip (mAbArray) at a high density, covering ~4950 different proteins of peach. Through the screening of peach fruit proteins with the mAbArray chip, differentially expressed proteins recognized by 1587 mAbs were identified, and 33 corresponding antigens were ultimately identified by immunoprecipitation and mass spectrometry. These proteins included not only important enzymes involved in ethylene biosynthesis, such as ACO1, SAHH, SAMS, and MetE, but also novel factors such as NUDT2. Furthermore, protein-protein interaction analysis identified a metabolon containing SAHH and MetE. By combining the antibody-based proteomic data with the transcriptomic and metabolic data, a mathematical model of ethylene biosynthesis in peach was constructed. Simulation results showed that MetE is an important regulator during peach ripening, partially through interaction with SAHH.
Collapse
Affiliation(s)
- Wenfang Zeng
- Zhengzhou Fruit Research Institute, Chinese Academy of Agricultural Sciences, 450009 Zhengzhou, China
| | - Liang Niu
- Zhengzhou Fruit Research Institute, Chinese Academy of Agricultural Sciences, 450009 Zhengzhou, China
| | | | - Xiaobei Wang
- Zhengzhou Fruit Research Institute, Chinese Academy of Agricultural Sciences, 450009 Zhengzhou, China
| | - Yan Wang
- Zhengzhou Fruit Research Institute, Chinese Academy of Agricultural Sciences, 450009 Zhengzhou, China
| | - Lei Pan
- Zhengzhou Fruit Research Institute, Chinese Academy of Agricultural Sciences, 450009 Zhengzhou, China
| | - Zhenhua Lu
- Zhengzhou Fruit Research Institute, Chinese Academy of Agricultural Sciences, 450009 Zhengzhou, China
| | - Guochao Cui
- Zhengzhou Fruit Research Institute, Chinese Academy of Agricultural Sciences, 450009 Zhengzhou, China
| | | | | | - Xun Meng
- Abmart, 200233 Shanghai, China
- Northwest University, 710127 Xi’an, China
| | - Zhiqiang Wang
- Zhengzhou Fruit Research Institute, Chinese Academy of Agricultural Sciences, 450009 Zhengzhou, China
| |
Collapse
|
8
|
Goh WWB, Wong L. Advanced bioinformatics methods for practical applications in proteomics. Brief Bioinform 2019; 20:347-355. [PMID: 30657890 DOI: 10.1093/bib/bbx128] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2017] [Indexed: 12/22/2022] Open
Abstract
Mass spectrometry (MS)-based proteomics has undergone rapid advancements in recent years, creating challenging problems for bioinformatics. We focus on four aspects where bioinformatics plays a crucial role (and proteomics is needed for clinical application): peptide-spectra matching (PSM) based on the new data-independent acquisition (DIA) paradigm, resolving missing proteins (MPs), dealing with biological and technical heterogeneity in data and statistical feature selection (SFS). DIA is a brute-force strategy that provides greater width and depth but, because it indiscriminately captures spectra such that signal from multiple peptides is mixed, getting good PSMs is difficult. We consider two strategies: simplification of DIA spectra to pseudo-data-dependent acquisition spectra or, alternatively, brute-force search of each DIA spectra against known reference libraries. The MP problem arises when proteins are never (or inconsistently) detected by MS. When observed in at least one sample, imputation methods can be used to guess the approximate protein expression level. If never observed at all, network/protein complex-based contextualization provides an independent prediction platform. Data heterogeneity is a difficult problem with two dimensions: technical (batch effects), which should be removed, and biological (including demography and disease subpopulations), which should be retained. Simple normalization is seldom sufficient, while batch effect-correction algorithms may create errors. Batch effect-resistant normalization methods are a viable alternative. Finally, SFS is vital for practical applications. While many methods exist, there is no best method, and both upstream (e.g. normalization) and downstream processing (e.g. multiple-testing correction) are performance confounders. We also discuss signal detection when class effects are weak.
Collapse
|
9
|
Proteomic investigation of intra-tumor heterogeneity using network-based contextualization - A case study on prostate cancer. J Proteomics 2019; 206:103446. [PMID: 31323421 DOI: 10.1016/j.jprot.2019.103446] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2019] [Revised: 06/12/2019] [Accepted: 07/08/2019] [Indexed: 12/26/2022]
Abstract
Cancer is a heterogeneous disease, confounding the identification of relevant markers and drug targets. Network-based analysis is robust against noise, potentially offering a promising approach towards biomarker identification. We describe here the application of two network-based methods, qPSP (Quantitative Proteomics Signature Profiling) and PFSNet (Paired Fuzzy SubNetworks), in an intra-tissue proteome data set of prostate tissue samples. Despite high basal variation, we find that traditional statistical analysis may exaggerate the extent of heterogeneity. We also report that network-based analysis outperforms protein-based feature selection with concomitantly higher cross-validation accuracy. Overall, network-based analysis provides emergent signal that boosts sensitivity while retaining good precision. It is a potential means of circumventing heterogeneity for stable biomarker discovery.
Collapse
|
10
|
Zhao Y, Sue ACH, Goh WWB. Deeper investigation into the utility of functional class scoring in missing protein prediction from proteomics data. J Bioinform Comput Biol 2019; 17:1950013. [DOI: 10.1142/s0219720019500136] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Functional Class Scoring (FCS) is a network-based approach previously demonstrated to be powerful in missing protein prediction (MPP). We update its performance evaluation using data derived from new proteomics technology (SWATH) and also checked for reproducibility using two independent datasets profiling kidney tissue proteome. We also evaluated the objectivity of the FCS p-value, and followed up on the value of MPP from predicted complexes. Our results suggest that (1) FCS [Formula: see text]-values are non-objective, and are confounded strongly by complex size, (2) best recovery performance do not necessarily lie at standard [Formula: see text]-value cutoffs, (3) while predicted complexes may be used for augmenting MPP, they are inferior to real complexes, and are further confounded by issues relating to network coverage and quality and (4) moderate sized complexes of size 5 to 10 still exhibit considerable instability, we find that FCS works best with big complexes. While FCS is a powerful approach, blind reliance on its non-objective [Formula: see text]-value is ill-advised.
Collapse
Affiliation(s)
- Yaxing Zhao
- School of Pharmaceutical Science and Technology, Tianjin University, No. 92, Weijin Road, 30072 Tianjin, P. R. China
| | - Andrew Chi-Hau Sue
- School of Pharmaceutical Science and Technology, Tianjin University, No. 92, Weijin Road, 30072 Tianjin, P. R. China
| | - Wilson Wen Bin Goh
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, 637551, Singapore
| |
Collapse
|
11
|
Wang C, Zhao S, Shao X, Park JB, Jeong SH, Park HJ, Kwak WJ, Wei G, Kim SW. Challenges and tackles in metabolic engineering for microbial production of carotenoids. Microb Cell Fact 2019; 18:55. [PMID: 30885243 PMCID: PMC6421696 DOI: 10.1186/s12934-019-1105-1] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2018] [Accepted: 03/08/2019] [Indexed: 02/07/2023] Open
Abstract
Naturally occurring carotenoids have been isolated and used as colorants, antioxidants, nutrients, etc. in many fields. There is an ever-growing demand for carotenoids production. To comfort this, microbial production of carotenoids is an attractive alternative to current extraction from natural sources. This review summarizes the biosynthetic pathway of carotenoids and progresses in metabolic engineering of various microorganisms for carotenoid production. The advances in synthetic pathway and systems biology lead to many versatile engineering tools available to manipulate microorganisms. In this context, challenges and possible directions are also discussed to provide an insight of microbial engineering for improved production of carotenoids in the future.
Collapse
Affiliation(s)
- Chonglong Wang
- School of Biology and Basic Medical Sciences, Soochow University, 199 Renai Road, Suzhou, 215123, People's Republic of China.
| | - Shuli Zhao
- School of Biology and Basic Medical Sciences, Soochow University, 199 Renai Road, Suzhou, 215123, People's Republic of China
| | - Xixi Shao
- School of Biology and Basic Medical Sciences, Soochow University, 199 Renai Road, Suzhou, 215123, People's Republic of China
| | - Ji-Bin Park
- Division of Applied Life Science (BK21 Plus), PMBBRC, Gyeongsang National University, 501 Jinju-daero, Jinju, 52828, Republic of Korea
| | - Seong-Hee Jeong
- Division of Applied Life Science (BK21 Plus), PMBBRC, Gyeongsang National University, 501 Jinju-daero, Jinju, 52828, Republic of Korea
| | - Hyo-Jin Park
- Division of Applied Life Science (BK21 Plus), PMBBRC, Gyeongsang National University, 501 Jinju-daero, Jinju, 52828, Republic of Korea
| | - Won-Ju Kwak
- Division of Applied Life Science (BK21 Plus), PMBBRC, Gyeongsang National University, 501 Jinju-daero, Jinju, 52828, Republic of Korea
| | - Gongyuan Wei
- School of Biology and Basic Medical Sciences, Soochow University, 199 Renai Road, Suzhou, 215123, People's Republic of China
| | - Seon-Won Kim
- Division of Applied Life Science (BK21 Plus), PMBBRC, Gyeongsang National University, 501 Jinju-daero, Jinju, 52828, Republic of Korea.
| |
Collapse
|
12
|
Wang C, Liwei M, Park JB, Jeong SH, Wei G, Wang Y, Kim SW. Microbial Platform for Terpenoid Production: Escherichia coli and Yeast. Front Microbiol 2018; 9:2460. [PMID: 30369922 PMCID: PMC6194902 DOI: 10.3389/fmicb.2018.02460] [Citation(s) in RCA: 48] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2018] [Accepted: 09/25/2018] [Indexed: 11/13/2022] Open
Abstract
Terpenoids, also called isoprenoids, are a large and highly diverse family of natural products with important medical and industrial properties. However, a limited production of terpenoids from natural resources constrains their use of either bulk commodity products or high valuable products. Microbial production of terpenoids from Escherichia coli and yeasts provides a promising alternative owing to available genetic tools in pathway engineering and genome editing, and a comprehensive understanding of their metabolisms. This review summarizes recent progresses in engineering of industrial model strains, E. coli and yeasts, for terpenoids production. With advances of synthetic biology and systems biology, both strains are expected to present the great potential as a platform of terpenoid synthesis.
Collapse
Affiliation(s)
- Chonglong Wang
- School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| | - Mudanguli Liwei
- School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| | - Ji-Bin Park
- Division of Applied Life Science (BK21 Plus), PMBBRC, Gyeongsang National University, Jinju, South Korea
| | - Seong-Hee Jeong
- Division of Applied Life Science (BK21 Plus), PMBBRC, Gyeongsang National University, Jinju, South Korea
| | - Gongyuan Wei
- School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| | - Yujun Wang
- Department of Marine Science, Qinzhou University, Qinzhou, China
| | - Seon-Won Kim
- Division of Applied Life Science (BK21 Plus), PMBBRC, Gyeongsang National University, Jinju, South Korea
| |
Collapse
|
13
|
Why breast cancer signatures are no better than random signatures explained. Drug Discov Today 2018; 23:1818-1823. [PMID: 29864526 DOI: 10.1016/j.drudis.2018.05.036] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2018] [Revised: 05/14/2018] [Accepted: 05/29/2018] [Indexed: 12/30/2022]
Abstract
Random signature superiority (RSS) occurs when random gene signatures outperform published and/or known signatures. Unlike reproducibility and generalizability issues, RSS is relatively underexplored. Yet, understanding it is imperative for better analytical outcome. In breast cancer, RSS correlates strongly with enrichment for proliferation genes and signature size. Removal of proliferation genes from random signatures reduces the predictive power of random signatures. Almost all genes are correlated to a certain extent with the proliferation signature, making complete elimination of its confounding effects impossible. RSS goes beyond breast cancer, because it also exists in other diseases; it is especially strong in other cancers in a platform-independent manner, and less severe, but present nonetheless, in nonproliferative diseases.
Collapse
|
14
|
Zhou L, Wong L, Goh WWB. Understanding missing proteins: a functional perspective. Drug Discov Today 2018; 23:644-651. [DOI: 10.1016/j.drudis.2017.11.011] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2017] [Revised: 10/24/2017] [Accepted: 11/13/2017] [Indexed: 01/03/2023]
|
15
|
Dealing with Confounders in Omics Analysis. Trends Biotechnol 2018; 36:488-498. [PMID: 29475622 DOI: 10.1016/j.tibtech.2018.01.013] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2017] [Revised: 01/28/2018] [Accepted: 01/29/2018] [Indexed: 01/05/2023]
Abstract
The Anna Karenina effect is a manifestation of the theory-practice gap that exists when theoretical statistics are applied on real-world data. In the course of analyzing biological data for differential features such as genes or proteins, it derives from the situation where the null hypothesis is rejected for extraneous reasons (or confounders), rather than because the alternative hypothesis is relevant to the disease phenotype. The mechanics of applying statistical tests therefore must address and resolve confounders. It is inadequate to simply rely on manipulating the P-value. We discuss three mechanistic elements (hypothesis statement construction, null distribution appropriateness, and test-statistic construction) and suggest how they can be designed to foil the Anna Karenina effect to select phenotypically relevant biological features.
Collapse
|
16
|
Abstract
Protein complex-based feature selection (PCBFS) provides unparalleled reproducibility with high phenotypic relevance on proteomics data. Currently, there are five PCBFS paradigms, but not all representative methods have been implemented or made readily available. To allow general users to take advantage of these methods, we developed the R-package NetProt, which provides implementations of representative feature-selection methods. NetProt also provides methods for generating simulated differential data and generating pseudocomplexes for complex-based performance benchmarking. The NetProt open source R package is available for download from https://github.com/gohwils/NetProt/releases/ , and online documentation is available at http://rpubs.com/gohwils/204259 .
Collapse
Affiliation(s)
- Wilson Wen Bin Goh
- School of Pharmaceutical Science and Technology, Tianjin University , 92 Weijin Road, Tianjin 300072, China.,School of Biological Sciences, Nanyang Technological University , 60 Nanyang Drive, Singapore 637551.,Department of Computer Science, National University of Singapore , 13 Computing Drive, Singapore 117417
| | - Limsoon Wong
- Department of Computer Science, National University of Singapore , 13 Computing Drive, Singapore 117417.,Department of Pathology, National University of Singapore , 5 Lower Kent Ridge Road, Singapore 119074
| |
Collapse
|
17
|
Goh WWB, Wong L. Class-paired Fuzzy SubNETs: A paired variant of the rank-based network analysis family for feature selection based on protein complexes. Proteomics 2017; 17:e1700093. [PMID: 28390171 DOI: 10.1002/pmic.201700093] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2017] [Accepted: 04/05/2017] [Indexed: 01/12/2023]
Abstract
Identifying reproducible yet relevant protein features in proteomics data is a major challenge. Analysis at the level of protein complexes can resolve this issue and we have developed a suite of feature-selection methods collectively referred to as Rank-Based Network Analysis (RBNA). RBNAs differ in their individual statistical test setup but are similar in the sense that they deploy rank-defined weights among proteins per sample. This procedure is known as gene fuzzy scoring. Currently, no RBNA exists for paired-sample scenarios where both control and test tissues originate from the same source (e.g. same patient). It is expected that paired tests, when used appropriately, are more powerful than approaches intended for unpaired samples. We report that the class-paired RBNA, PPFSNET, dominates in both simulated and real data scenarios. Moreover, for the first time, we explicitly incorporate batch-effect resistance as an additional evaluation criterion for feature-selection approaches. Batch effects are class irrelevant variations arising from different handlers or processing times, and can obfuscate analysis. We demonstrate that PPFSNET and an earlier RBNA, PFSNET, are particularly resistant against batch effects, and only select features strongly correlated with class but not batch.
Collapse
Affiliation(s)
- Wilson Wen Bin Goh
- School of Pharmaceutical Science and Technology, Tianjin University, P. R. China.,Department of Computer Science, National University of Singapore, Singapore
| | - Limsoon Wong
- Department of Computer Science, National University of Singapore, Singapore.,Department of Pathology, National University of Singapore, Singapore
| |
Collapse
|
18
|
Goh WWB, Wong L. Protein complex-based analysis is resistant to the obfuscating consequences of batch effects --- a case study in clinical proteomics. BMC Genomics 2017; 18:142. [PMID: 28361693 PMCID: PMC5374662 DOI: 10.1186/s12864-017-3490-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Background In proteomics, batch effects are technical sources of variation that confounds proper analysis, preventing effective deployment in clinical and translational research. Results Using simulated and real data, we demonstrate existing batch effect-correction methods do not always eradicate all batch effects. Worse still, they may alter data integrity, and introduce false positives. Moreover, although Principal component analysis (PCA) is commonly used for detecting batch effects. The principal components (PCs) themselves may be used as differential features, from which relevant differential proteins may be effectively traced. Batch effect are removable by identifying PCs highly correlated with batch but not class effect. However, neither PC-based nor existing batch effect-correction methods address well subtle batch effects, which are difficult to eradicate, and involve data transformation and/or projection which is error-prone. To address this, we introduce the concept of batch-effect resistant methods and demonstrate how such methods incorporating protein complexes are particularly resistant to batch effect without compromising data integrity. Conclusions Protein complex-based analyses are powerful, offering unparalleled differential protein-selection reproducibility and high prediction accuracy. We demonstrate for the first time their innate resistance against batch effects, even subtle ones. As complex-based analyses require no prior data transformation (e.g. batch-effect correction), data integrity is protected. Individual checks on top-ranked protein complexes confirm strong association with phenotype classes and not batch. Therefore, the constituent proteins of these complexes are more likely to be clinically relevant. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3490-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Wilson Wen Bin Goh
- School of Pharmaceutical Science and Technology, Tianjin University, 92 Weijin Road, Nankai District, Tianjin, 300072, People's Republic of China. .,Department of Computer Science, National University of Singapore, 13 Computing Drive, Singapore, 117417, Singapore.
| | - Limsoon Wong
- Department of Computer Science, National University of Singapore, 13 Computing Drive, Singapore, 117417, Singapore. .,Department of Pathology, National University of Singapore, Singapore, Singapore.
| |
Collapse
|
19
|
Wang W, Sue ACH, Goh WWB. Feature selection in clinical proteomics: with great power comes great reproducibility. Drug Discov Today 2016; 22:912-918. [PMID: 27988358 DOI: 10.1016/j.drudis.2016.12.006] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2016] [Revised: 11/27/2016] [Accepted: 12/08/2016] [Indexed: 01/17/2023]
Abstract
In clinical proteomics, reproducible feature selection is unattainable given the standard statistical hypothesis-testing framework. This leads to irreproducible signatures with no diagnostic power. Instability stems from high P-value variability (p_var), which is inevitable and insolvable. The impact of p_var can be reduced via power increment, for example increasing sample size and measurement accuracy. However, these are not realistic solutions in practice. Instead, workarounds using existing data such as signal boosting transformation techniques and network-based statistical testing is more practical. Furthermore, it is useful to consider other metrics alongside P-values including confidence intervals, effect sizes and cross-validation accuracies to make informed inferences.
Collapse
Affiliation(s)
- Wei Wang
- School of Pharmaceutical Science and Technology, Tianjin University, China
| | - Andrew C-H Sue
- School of Pharmaceutical Science and Technology, Tianjin University, China
| | - Wilson W B Goh
- School of Pharmaceutical Science and Technology, Tianjin University, China; Department of Bioengineering, Tianjin University, China; Department of Computer Science, National University of Singapore, Singapore.
| |
Collapse
|
20
|
Goh WWB. Fuzzy-FishNET: a highly reproducible protein complex-based approach for feature selection in comparative proteomics. BMC Med Genomics 2016; 9:67. [PMID: 28117654 PMCID: PMC5260792 DOI: 10.1186/s12920-016-0228-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
Background The hypergeometric enrichment analysis approach typically fares poorly in feature-selection stability due to its upstream reliance on the t-test to generate differential protein lists before testing for enrichment on a protein complex, subnetwork or gene group. Methods Swapping the t-test in favour of a fuzzy rank-based weight system similar to that used in network-based methods like Quantitative Proteomics Signature Profiling (QPSP), Fuzzy SubNets (FSNET) and paired FSNET (PFSNET) produces dramatic improvements. Results This approach, Fuzzy-FishNET, exhibits high precision-recall over three sets of simulated data (with simulated protein complexes) while excelling in feature-selection reproducibility on real data (based on evaluation with real protein complexes). Overlap comparisons with PFSNET shows Fuzzy-FishNET selects the most significant complexes, which are also strongly class-discriminative. Cross-validation further demonstrates Fuzzy-FishNET selects class-relevant protein complexes. Conclusions Based on evaluation with simulated and real datasets, Fuzzy-FishNET is a significant upgrade of the traditional hypergeometric enrichment approach and a powerful new entrant amongst comparative proteomics analysis methods. Electronic supplementary material The online version of this article (doi:10.1186/s12920-016-0228-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Wilson Wen Bin Goh
- School of Pharmaceutical Science and Technology, Tianjin University, Tianjin, People's Republic of China.
| |
Collapse
|
21
|
Goh WWB, Wong L. Advancing Clinical Proteomics via Analysis Based on Biological Complexes: A Tale of Five Paradigms. J Proteome Res 2016; 15:3167-79. [DOI: 10.1021/acs.jproteome.6b00402] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Affiliation(s)
- Wilson Wen Bin Goh
- School
of Pharmaceutical Science and Technology, Tianjin University, 92 Weijin Road, Nankai District, Tianjin 300072, China
- Department
of Computer Science, National University of Singapore, 13 Computing
Drive, Singapore 117417
| | - Limsoon Wong
- Department
of Computer Science, National University of Singapore, 13 Computing
Drive, Singapore 117417
- Department
of Pathology, National University of Singapore, 5 Lower Kent Ridge Road, Singapore 117417
| |
Collapse
|