1
|
Bendel AM, Skendo K, Klein D, Shimada K, Kauneckaite-Griguole K, Diss G. Optimization of a deep mutational scanning workflow to improve quantification of mutation effects on protein-protein interactions. BMC Genomics 2024; 25:630. [PMID: 38914936 PMCID: PMC11194945 DOI: 10.1186/s12864-024-10524-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Accepted: 06/14/2024] [Indexed: 06/26/2024] Open
Abstract
Deep Mutational Scanning (DMS) assays are powerful tools to study sequence-function relationships by measuring the effects of thousands of sequence variants on protein function. During a DMS experiment, several technical artefacts might distort non-linearly the functional score obtained, potentially biasing the interpretation of the results. We therefore tested several technical parameters in the deepPCA workflow, a DMS assay for protein-protein interactions, in order to identify technical sources of non-linearities. We found that parameters common to many DMS assays such as amount of transformed DNA, timepoint of harvest and library composition can cause non-linearities in the data. Designing experiments in a way to minimize these non-linear effects will improve the quantification and interpretation of mutation effects.
Collapse
Affiliation(s)
- Alexandra M Bendel
- Friedrich Miescher Institute for Biomedical Research (FMI), Basel, Switzerland
- University of Basel, Basel, Switzerland
| | | | - Dominique Klein
- Friedrich Miescher Institute for Biomedical Research (FMI), Basel, Switzerland
| | - Kenji Shimada
- Friedrich Miescher Institute for Biomedical Research (FMI), Basel, Switzerland
| | - Kotryna Kauneckaite-Griguole
- Friedrich Miescher Institute for Biomedical Research (FMI), Basel, Switzerland
- University of Basel, Basel, Switzerland
| | - Guillaume Diss
- Friedrich Miescher Institute for Biomedical Research (FMI), Basel, Switzerland.
| |
Collapse
|
2
|
Wu W, Huang Z, Kong W, Peng H, Goh WWB. Optimizing the PROTREC network-based missing protein prediction algorithm. Proteomics 2024; 24:e2200332. [PMID: 37876146 DOI: 10.1002/pmic.202200332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2022] [Revised: 09/30/2023] [Accepted: 10/06/2023] [Indexed: 10/26/2023]
Abstract
This article summarizes the PROTREC method and investigates the impact that the different hyper-parameters have on the task of missing protein prediction using PROTREC. We evaluate missing protein recovery rates using different PROTREC score selection approaches (MAX, MIN, MEDIAN, and MEAN), different PROTREC score thresholds, as well as different complex size thresholds. In addition, we included two additional cancer datasets in our analysis and introduced a new validation method to check both the robustness of the PROTREC method as well as the correctness of our analysis. Our analysis showed that the missing protein recovery rate can be improved by adopting PROTREC score selection operations of MIN, MEDIAN, and MEAN instead of the default MAX. However, this may come at a cost of reduced numbers of proteins predicted and validated. The users should therefore choose their hyper-parameters carefully to find a balance in the accuracy-quantity trade-off. We also explored the possibility of combining PROTREC with a p-value-based method (FCS) and demonstrated that PROTREC is able to perform well independently without any help from a p-value-based method. Furthermore, we conducted a downstream enrichment analysis to understand the biological pathways and protein networks within the cancerous tissues using the recovered proteins. Missing protein recovery rate using PROTREC can be improved by selecting a different PROTREC score selection method. Different PROTREC score selection methods and other hyper-parameters such as PROTREC score threshold and complex size threshold introduce accuracy-quantity trade-off. PROTREC is able to perform well independently of any filtering using a p-value-based method. Verification of the PROTREC method on additional cancer datasets. Downstream Enrichment Analysis to understand the biological pathways and protein networks in cancerous tissues.
Collapse
Affiliation(s)
- Wenshan Wu
- School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore
| | - Zelu Huang
- School of Chemistry, Chemical Engineering and Biotechnology, Nanyang Technological University, Singapore, Singapore
| | - Weijia Kong
- Department of Computer Science, National University of Singapore, Singapore, Singapore
- School of Biological Science, Nanyang Technological University, Singapore, Singapore
| | - Hui Peng
- School of Biological Science, Nanyang Technological University, Singapore, Singapore
| | - Wilson Wen Bin Goh
- School of Biological Science, Nanyang Technological University, Singapore, Singapore
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore
- Center for Biomedical Informatics, Nanyang Technological University, Singapore, Singapore
| |
Collapse
|
3
|
ProInfer: An interpretable protein inference tool leveraging on biological networks. PLoS Comput Biol 2023; 19:e1010961. [PMID: 36930671 PMCID: PMC10057851 DOI: 10.1371/journal.pcbi.1010961] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Revised: 03/29/2023] [Accepted: 02/20/2023] [Indexed: 03/18/2023] Open
Abstract
In mass spectrometry (MS)-based proteomics, protein inference from identified peptides (protein fragments) is a critical step. We present ProInfer (Protein Inference), a novel protein assembly method that takes advantage of information in biological networks. ProInfer assists recovery of proteins supported only by ambiguous peptides (a peptide which maps to more than one candidate protein) and enhances the statistical confidence for proteins supported by both unique and ambiguous peptides. Consequently, ProInfer rescues weakly supported proteins thereby improving proteome coverage. Evaluated across THP1 cell line, lung cancer and RAW267.4 datasets, ProInfer always infers the most numbers of true positives, in comparison to mainstream protein inference tools Fido, EPIFANY and PIA. ProInfer is also adept at retrieving differentially expressed proteins, signifying its usefulness for functional analysis and phenotype profiling. Source codes of ProInfer are available at https://github.com/PennHui2016/ProInfer.
Collapse
|
4
|
Babu G, Nobel FA. Identification of differentially expressed genes and their major pathways among the patient with COVID-19, cystic fibrosis, and chronic kidney disease. INFORMATICS IN MEDICINE UNLOCKED 2022; 32:101038. [PMID: 35966126 PMCID: PMC9357445 DOI: 10.1016/j.imu.2022.101038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2022] [Revised: 07/31/2022] [Accepted: 08/01/2022] [Indexed: 11/19/2022] Open
Abstract
The SARS-CoV-2 virus causes Coronavirus disease, an infectious disease. The majority of people who are infected with this virus will have mild to moderate respiratory symptoms. Multiple studies have proved that there is a substantial pathophysiological link between COVID-19 disease and patients having comorbidities such as cystic fibrosis and chronic kidney disease. In this study, we attempted to identify differentially expressed genes as well as genes that intersected among them in order to comprehend their compatibility. Gene expression profiling indicated that 849 genes were mutually exclusive and functional analysis was done within the context of gene ontology and key pathways involvement. Three genes (PRPF31, FOXN2, and RIOK3) were commonly upregulated in the analysed datasets of three disease categories. These genes could be potential biomarkers for patients with COVID-19 and cystic fibrosis, and COVID-19 and chronic kidney disease. Further extensive analyses have been performed to describe how these genes are regulated by various transcription factors and microRNAs. Then, our analyses revealed six hub genes (PRPF31, FOXN2, RIOK3, UBC, HNF4A, and ELAVL). As they were involved in the interaction between COVID-19 and the patient with CF and CKD, they could help researchers identify potential therapeutic molecules. Some drugs have been predicted based on the upregulated genes, which may have a significant impact on reducing the burden of these diseases in the future.
Collapse
Affiliation(s)
- Golap Babu
- Department of Biochemistry and Molecular Biology, Jahangirnagar University, Savar, Dhaka, 1342, Bangladesh
| | - Fahim Alam Nobel
- Department of Biochemistry and Molecular Biology, Mawlana Bhashani Science and Technology University, Santosh, Tangail, 1902, Bangladesh
| |
Collapse
|
5
|
Resolving missing protein problems using functional class scoring. Sci Rep 2022; 12:11358. [PMID: 35790756 PMCID: PMC9256666 DOI: 10.1038/s41598-022-15314-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2021] [Accepted: 06/22/2022] [Indexed: 11/29/2022] Open
Abstract
Despite technological advances in proteomics, incomplete coverage and inconsistency issues persist, resulting in “data holes”. These data holes cause the missing protein problem (MPP), where relevant proteins are persistently unobserved, or sporadically observed across samples, hindering biomarker discovery and proper functional characterization. Network-based approaches can provide powerful solutions for resolving these issues. Functional Class Scoring (FCS) is one such method that uses protein complex information to recover missing proteins with weak support. However, FCS has not been evaluated on more recent proteomic technologies with higher coverage, and there is no clear way to evaluate its performance. To address these issues, we devised a more rigorous evaluation schema based on cross-verification between technical replicates and evaluated its performance on data acquired under recent Data-Independent Acquisition (DIA) technologies (viz. SWATH). Although cross-replicate examination reveals some inconsistencies amongst same-class samples, tissue-differentiating signal is nonetheless strongly conserved, confirming that FCS selects for biologically meaningful networks. We also report that predicted missing proteins are statistically significant based on FCS p values. Despite limited cross-replicate verification rates, the predicted missing proteins as a whole have higher peptide support than non-predicted proteins. FCS also predicts missing proteins that are often lost due to weak specific peptide support.
Collapse
|
6
|
Wang W, Liu W. PCLasso: a protein complex-based, group lasso-Cox model for accurate prognosis and risk protein complex discovery. Brief Bioinform 2021; 22:6291946. [PMID: 34086850 DOI: 10.1093/bib/bbab212] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2021] [Revised: 05/08/2021] [Accepted: 05/15/2021] [Indexed: 12/12/2022] Open
Abstract
For high-dimensional expression data, most prognostic models perform feature selection based on individual genes, which usually lead to unstable prognosis, and the identified risk genes are inherently insufficient in revealing complex molecular mechanisms. Since most genes carry out cellular functions by forming protein complexes-basic representatives of functional modules, identifying risk protein complexes may greatly improve our understanding of disease biology. Coupled with the fact that protein complexes have been shown to have innate resistance to batch effects and are effective predictors of disease phenotypes, constructing prognostic models and selecting features with protein complexes as the basic unit should improve the robustness and biological interpretability of the model. Here, we propose a protein complex-based, group lasso-Cox model (PCLasso) to predict patient prognosis and identify risk protein complexes. Experiments on three cancer types have proved that PCLasso has better prognostic performance than prognostic models based on individual genes. The resulting risk protein complexes not only contain individual risk genes but also incorporate close partners that synergize with them, which may promote the revealing of molecular mechanisms related to cancer progression from a comprehensive perspective. Furthermore, a pan-cancer prognostic analysis was performed to identify risk protein complexes of 19 cancer types, which may provide novel potential targets for cancer research.
Collapse
Affiliation(s)
- Wei Wang
- Heilongjiang Institute of Technology, Harbin 150050, China
| | - Wei Liu
- School of Science at Heilongjiang Institute of Technology, Harbin 150050, China
| |
Collapse
|
7
|
Tang X, Xiao Q, Yu K. Breast Cancer Candidate Gene Detection Through Integration of Subcellular Localization Data With Protein–Protein Interaction Networks. IEEE Trans Nanobioscience 2020; 19:556-561. [DOI: 10.1109/tnb.2020.2990178] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
8
|
Ho SY, Wong L, Goh WWB. Avoid Oversimplifications in Machine Learning: Going beyond the Class-Prediction Accuracy. PATTERNS 2020; 1:100025. [PMID: 33205097 PMCID: PMC7660406 DOI: 10.1016/j.patter.2020.100025] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Class-prediction accuracy provides a quick but superficial way of determining classifier performance. It does not inform on the reproducibility of the findings or whether the selected or constructed features used are meaningful and specific. Furthermore, the class-prediction accuracy oversummarizes and does not inform on how training and learning have been accomplished: two classifiers providing the same performance in one validation can disagree on many future validations. It does not provide explainability in its decision-making process and is not objective, as its value is also affected by class proportions in the validation set. Despite these issues, this does not mean we should omit the class-prediction accuracy. Instead, it needs to be enriched with accompanying evidence and tests that supplement and contextualize the reported accuracy. This additional evidence serves as augmentations and can help us perform machine learning better while avoiding naive reliance on oversimplified metrics. There is a huge potential for machine learning, but blind reliance on oversimplified metrics can mislead. Class-prediction accuracy is a common metric used for determining classifier performance. This article provides examples to show how the class-prediction accuracy is superficial and even misleading. We propose some augmentative measures to supplement the class-prediction accuracy. This in turn helps us to better understand the quality of learning of the classifier.
Collapse
Affiliation(s)
- Sung Yang Ho
- School of Biological Sciences, Nanyang Technological University, Singapore 637551, Singapore
| | - Limsoon Wong
- Department of Computer Science, National University of Singapore, Singapore 117417, Singapore
| | - Wilson Wen Bin Goh
- School of Biological Sciences, Nanyang Technological University, Singapore 637551, Singapore
| |
Collapse
|
9
|
McWhite CD, Papoulas O, Drew K, Cox RM, June V, Dong OX, Kwon T, Wan C, Salmi ML, Roux SJ, Browning KS, Chen ZJ, Ronald PC, Marcotte EM. A Pan-plant Protein Complex Map Reveals Deep Conservation and Novel Assemblies. Cell 2020; 181:460-474.e14. [PMID: 32191846 DOI: 10.1016/j.cell.2020.02.049] [Citation(s) in RCA: 108] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Revised: 01/08/2020] [Accepted: 02/21/2020] [Indexed: 01/11/2023]
Abstract
Plants are foundational for global ecological and economic systems, but most plant proteins remain uncharacterized. Protein interaction networks often suggest protein functions and open new avenues to characterize genes and proteins. We therefore systematically determined protein complexes from 13 plant species of scientific and agricultural importance, greatly expanding the known repertoire of stable protein complexes in plants. By using co-fractionation mass spectrometry, we recovered known complexes, confirmed complexes predicted to occur in plants, and identified previously unknown interactions conserved over 1.1 billion years of green plant evolution. Several novel complexes are involved in vernalization and pathogen defense, traits critical for agriculture. We also observed plant analogs of animal complexes with distinct molecular assemblies, including a megadalton-scale tRNA multi-synthetase complex. The resulting map offers a cross-species view of conserved, stable protein assemblies shared across plant cells and provides a mechanistic, biochemical framework for interpreting plant genetics and mutant phenotypes.
Collapse
Affiliation(s)
- Claire D McWhite
- Department of Molecular Biosciences, Center for Systems and Synthetic Biology, University of Texas, Austin, TX 78712, USA
| | - Ophelia Papoulas
- Department of Molecular Biosciences, Center for Systems and Synthetic Biology, University of Texas, Austin, TX 78712, USA
| | - Kevin Drew
- Department of Molecular Biosciences, Center for Systems and Synthetic Biology, University of Texas, Austin, TX 78712, USA
| | - Rachael M Cox
- Department of Molecular Biosciences, Center for Systems and Synthetic Biology, University of Texas, Austin, TX 78712, USA
| | - Viviana June
- Department of Molecular Biosciences, Center for Systems and Synthetic Biology, University of Texas, Austin, TX 78712, USA
| | - Oliver Xiaoou Dong
- Department of Plant Pathology and The Genome Center, University of California, Davis, Davis, CA 95616, USA; Joint Bioenergy Institute, Emeryville, CA 94608, USA
| | - Taejoon Kwon
- Department of Biomedical Engineering, School of Life Sciences, Ulsan National Institute of Science and Technology (UNIST), 50 UNIST-gil, Ulju-gun, Ulsan 44919, Republic of Korea
| | - Cuihong Wan
- Department of Molecular Biosciences, Center for Systems and Synthetic Biology, University of Texas, Austin, TX 78712, USA; Hubei Key Lab of Genetic Regulation and Integrative Biology, School of Life Sciences, Central China Normal University, No. 152 Luoyu Road, Wuhan 430079, P.R. China
| | - Mari L Salmi
- Department of Molecular Biosciences, Center for Systems and Synthetic Biology, University of Texas, Austin, TX 78712, USA
| | - Stanley J Roux
- Department of Molecular Biosciences, Center for Systems and Synthetic Biology, University of Texas, Austin, TX 78712, USA
| | - Karen S Browning
- Department of Molecular Biosciences, Center for Systems and Synthetic Biology, University of Texas, Austin, TX 78712, USA
| | - Z Jeffrey Chen
- Department of Molecular Biosciences, Center for Systems and Synthetic Biology, University of Texas, Austin, TX 78712, USA
| | - Pamela C Ronald
- Department of Plant Pathology and The Genome Center, University of California, Davis, Davis, CA 95616, USA; Joint Bioenergy Institute, Emeryville, CA 94608, USA
| | - Edward M Marcotte
- Department of Molecular Biosciences, Center for Systems and Synthetic Biology, University of Texas, Austin, TX 78712, USA.
| |
Collapse
|
10
|
Guo T, Luna A, Rajapakse VN, Koh CC, Wu Z, Liu W, Sun Y, Gao H, Menden MP, Xu C, Calzone L, Martignetti L, Auwerx C, Buljan M, Banaei-Esfahani A, Ori A, Iskar M, Gillet L, Bi R, Zhang J, Zhang H, Yu C, Zhong Q, Varma S, Schmitt U, Qiu P, Zhang Q, Zhu Y, Wild PJ, Garnett MJ, Bork P, Beck M, Liu K, Saez-Rodriguez J, Elloumi F, Reinhold WC, Sander C, Pommier Y, Aebersold R. Quantitative Proteome Landscape of the NCI-60 Cancer Cell Lines. iScience 2019; 21:664-680. [PMID: 31733513 PMCID: PMC6889472 DOI: 10.1016/j.isci.2019.10.059] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2019] [Revised: 10/21/2019] [Accepted: 10/28/2019] [Indexed: 12/15/2022] Open
Abstract
Here we describe a proteomic data resource for the NCI-60 cell lines generated by pressure cycling technology and SWATH mass spectrometry. We developed the DIA-expert software to curate and visualize the SWATH data, leading to reproducible detection of over 3,100 SwissProt proteotypic proteins and systematic quantification of pathway activities. Stoichiometric relationships of interacting proteins for DNA replication, repair, the chromatin remodeling NuRD complex, β-catenin, RNA metabolism, and prefoldins are more evident than that at the mRNA level. The data are available in CellMiner (discover.nci.nih.gov/cellminercdb and discover.nci.nih.gov/cellminer), allowing casual users to test hypotheses and perform integrative, cross-database analyses of multi-omic drug response correlations for over 20,000 drugs. We demonstrate the value of proteome data in predicting drug response for over 240 clinically relevant chemotherapeutic and targeted therapies. In summary, we present a novel proteome resource for the NCI-60, together with relevant software tools, and demonstrate the benefit of proteome analyses. High-quality NCI-60 proteotypes created using pressure cycling technology and SWATH-MS Proteotypes improve drug response prediction in multi-omics regression analysis ∼3000 measured proteins allow investigation into protein complex stoichiometry CellMinerCDB (discover.nci.nih.gov/cellminercdb) portal allows dataset exploration
Collapse
Affiliation(s)
- Tiannan Guo
- Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, Hangzhou, Zhejiang, P. R. China; Guomics Laboratory of Proteomic Big Data, Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, 18 Shilongshan Road, Hangzhou 310024, Zhejiang Province, China; Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland.
| | - Augustin Luna
- cBio Center, Division of Biostatistics, Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA 02115, USA; Department of Cell Biology, Harvard Medical School, Boston, MA 02115, USA
| | - Vinodh N Rajapakse
- Developmental Therapeutics Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Ching Chiek Koh
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
| | - Zhicheng Wu
- Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, Hangzhou, Zhejiang, P. R. China; Guomics Laboratory of Proteomic Big Data, Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, 18 Shilongshan Road, Hangzhou 310024, Zhejiang Province, China
| | - Wei Liu
- Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, Hangzhou, Zhejiang, P. R. China; Guomics Laboratory of Proteomic Big Data, Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, 18 Shilongshan Road, Hangzhou 310024, Zhejiang Province, China; Department of Clinical Pharmacology, College of Pharmacy, Dalian Medical University, Dalian, Liaoning, China
| | - Yaoting Sun
- Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, Hangzhou, Zhejiang, P. R. China; Guomics Laboratory of Proteomic Big Data, Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, 18 Shilongshan Road, Hangzhou 310024, Zhejiang Province, China
| | - Huanhuan Gao
- Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, Hangzhou, Zhejiang, P. R. China; Guomics Laboratory of Proteomic Big Data, Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, 18 Shilongshan Road, Hangzhou 310024, Zhejiang Province, China
| | - Michael P Menden
- RWTH Aachen University, Faculty of Medicine, Joint Research Centre for Computational Biomedicine (JRC-COMBINE), Aachen, Germany; Bioscience, Oncology, IMED Biotech Unit, AstraZeneca, Cambridge, UK
| | - Chao Xu
- Faculty of Software, Fujian Normal University, Fuzhou, China
| | - Laurence Calzone
- Institut Curie, PSL Research University, INSERM, U900, Mines Paris Tech 75005, Paris, France
| | - Loredana Martignetti
- Institut Curie, PSL Research University, INSERM, U900, Mines Paris Tech 75005, Paris, France
| | - Chiara Auwerx
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
| | - Marija Buljan
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
| | - Amir Banaei-Esfahani
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland; PhD Program in Systems Biology, Life Science Zurich Graduate School, University of Zurich and ETH Zurich, Zurich, Switzerland
| | - Alessandro Ori
- Leibniz Institute on Aging, Fritz Lipmann Institute (FLI), Beutenbergstrasse 11, 07745 Jena, Germany
| | - Murat Iskar
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany
| | - Ludovic Gillet
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
| | - Ran Bi
- Department of Clinical Pharmacology, College of Pharmacy, Dalian Medical University, Dalian, Liaoning, China
| | - Jiangnan Zhang
- Department of Clinical Pharmacology, College of Pharmacy, Dalian Medical University, Dalian, Liaoning, China
| | - Huanhuan Zhang
- Key Laboratory of Experimental Animal and Safety Evaluation, Zhejiang Academy of Medical Sciences, Hangzhou, Zhejiang, China
| | - Chenhuan Yu
- Key Laboratory of Experimental Animal and Safety Evaluation, Zhejiang Academy of Medical Sciences, Hangzhou, Zhejiang, China
| | - Qing Zhong
- Institute of Surgical Pathology, University Hospital Zurich, Zurich, Switzerland; Cancer Data Science Group, Children's Medical Research Institute, University of Sydney, Sydney, NSW, Australia
| | | | - Uwe Schmitt
- Scientific IT Services, ETH Zurich, Zurich, Switzerland
| | - Peng Qiu
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, 313 Ferst Dr., Atlanta, GA 30332, USA
| | - Qiushi Zhang
- Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, Hangzhou, Zhejiang, P. R. China; Guomics Laboratory of Proteomic Big Data, Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, 18 Shilongshan Road, Hangzhou 310024, Zhejiang Province, China
| | - Yi Zhu
- Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, Hangzhou, Zhejiang, P. R. China; Guomics Laboratory of Proteomic Big Data, Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, 18 Shilongshan Road, Hangzhou 310024, Zhejiang Province, China; Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
| | - Peter J Wild
- Institute of Surgical Pathology, University Hospital Zurich, Zurich, Switzerland
| | - Mathew J Garnett
- Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Peer Bork
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany; Molecular Medicine Partnership Unit, University of Heidelberg and European Molecular Biology Laboratory, 69120 Heidelberg, Germany; Max Delbrück Centre for Molecular Medicine, 13125 Berlin, Germany; Department of Bioinformatics, Biocenter, University of Würzburg, 97074 Würzburg, Germany
| | - Martin Beck
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany; Cell Biology and Biophysics Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany
| | - Kexin Liu
- Department of Clinical Pharmacology, College of Pharmacy, Dalian Medical University, Dalian, Liaoning, China
| | - Julio Saez-Rodriguez
- RWTH Aachen University, Faculty of Medicine, Joint Research Centre for Computational Biomedicine (JRC-COMBINE), Aachen, Germany
| | - Fathi Elloumi
- Developmental Therapeutics Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - William C Reinhold
- Developmental Therapeutics Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Chris Sander
- cBio Center, Division of Biostatistics, Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA 02115, USA; Department of Cell Biology, Harvard Medical School, Boston, MA 02115, USA
| | - Yves Pommier
- Developmental Therapeutics Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA.
| | - Ruedi Aebersold
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland; Faculty of Science, University of Zurich, Zurich, Switzerland.
| |
Collapse
|
11
|
Wen Bin Goh W, Thalappilly S, Thibault G. Moving beyond the current limits of data analysis in longevity and healthy lifespan studies. Drug Discov Today 2019; 24:2273-2285. [PMID: 31499187 DOI: 10.1016/j.drudis.2019.08.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Revised: 08/03/2019] [Accepted: 08/28/2019] [Indexed: 11/19/2022]
Abstract
Living longer with sustainable quality of life is becoming increasingly important in aging populations. Understanding associative biological mechanisms have proven daunting, because of multigenicity and population heterogeneity. Although Big Data and Artificial Intelligence (AI) could help, naïve adoption is ill advised. We hold the view that model organisms are better suited for big-data analytics but might lack relevance because they do not immediately reflect the human condition. Resolving this hurdle and bridging the human-model organism gap will require some finesse. This includes improving signal:noise ratios by appropriate contextualization of high-throughput data, establishing consistency across multiple high-throughput platforms, and adopting supporting technologies that provide useful in silico and in vivo validation strategies.
Collapse
Affiliation(s)
- Wilson Wen Bin Goh
- Bio-Data Science and Education Research Group, School of Biological Sciences, Nanyang Technological University, 637551, Singapore.
| | - Subhash Thalappilly
- Lipid Regulation and Cell Stress Research Group, School of Biological Sciences, Nanyang Technological University, 637551, Singapore
| | - Guillaume Thibault
- Lipid Regulation and Cell Stress Research Group, School of Biological Sciences, Nanyang Technological University, 637551, Singapore; Institute of Molecular and Cell Biology, A*STAR, 138673, Singapore.
| |
Collapse
|
12
|
Proteomic investigation of intra-tumor heterogeneity using network-based contextualization - A case study on prostate cancer. J Proteomics 2019; 206:103446. [PMID: 31323421 DOI: 10.1016/j.jprot.2019.103446] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2019] [Revised: 06/12/2019] [Accepted: 07/08/2019] [Indexed: 12/26/2022]
Abstract
Cancer is a heterogeneous disease, confounding the identification of relevant markers and drug targets. Network-based analysis is robust against noise, potentially offering a promising approach towards biomarker identification. We describe here the application of two network-based methods, qPSP (Quantitative Proteomics Signature Profiling) and PFSNet (Paired Fuzzy SubNetworks), in an intra-tissue proteome data set of prostate tissue samples. Despite high basal variation, we find that traditional statistical analysis may exaggerate the extent of heterogeneity. We also report that network-based analysis outperforms protein-based feature selection with concomitantly higher cross-validation accuracy. Overall, network-based analysis provides emergent signal that boosts sensitivity while retaining good precision. It is a potential means of circumventing heterogeneity for stable biomarker discovery.
Collapse
|
13
|
Bergendahl LT, Gerasimavicius L, Miles J, Macdonald L, Wells JN, Welburn JPI, Marsh JA. The role of protein complexes in human genetic disease. Protein Sci 2019; 28:1400-1411. [PMID: 31219644 DOI: 10.1002/pro.3667] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2019] [Accepted: 06/10/2019] [Indexed: 12/20/2022]
Abstract
Many human genetic disorders are caused by mutations in protein-coding regions of DNA. Taking protein structure into account has therefore provided key insight into the molecular mechanisms underlying human genetic disease. Although most studies have focused on the intramolecular effects of mutations, the critical role of the assembly of proteins into complexes is being increasingly recognized. Here, we review multiple ways in which consideration of protein complexes can help us to understand and explain the effects of pathogenic mutations. First, we discuss disorders caused by mutations that perturb intersubunit interactions in homomeric and heteromeric complexes. Second, we address how protein complex assembly can facilitate a dominant-negative mechanism, whereby mutated subunits can disrupt the activity of wild-type protein. Third, we show how mutations that change protein expression levels can lead to damaging stoichiometric imbalances. Finally, we review how mutations affecting different subunits of the same heteromeric complex often cause similar diseases, whereas mutations in different interfaces of the same subunit can cause distinct phenotypes.
Collapse
Affiliation(s)
- L Therese Bergendahl
- MRC Human Genetics Unit, Institute of Genetics & Molecular Medicine, University of Edinburgh, Edinburgh, EH4 2XU, United Kingdom
| | - Lukas Gerasimavicius
- MRC Human Genetics Unit, Institute of Genetics & Molecular Medicine, University of Edinburgh, Edinburgh, EH4 2XU, United Kingdom
| | - Jamilla Miles
- MRC Human Genetics Unit, Institute of Genetics & Molecular Medicine, University of Edinburgh, Edinburgh, EH4 2XU, United Kingdom
| | - Lewis Macdonald
- MRC Human Genetics Unit, Institute of Genetics & Molecular Medicine, University of Edinburgh, Edinburgh, EH4 2XU, United Kingdom
| | - Jonathan N Wells
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York, 14850
| | - Julie P I Welburn
- Wellcome Trust Centre for Cell Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, EH9 3BF, United Kingdom
| | - Joseph A Marsh
- MRC Human Genetics Unit, Institute of Genetics & Molecular Medicine, University of Edinburgh, Edinburgh, EH4 2XU, United Kingdom
| |
Collapse
|
14
|
Zhao Y, Sue ACH, Goh WWB. Deeper investigation into the utility of functional class scoring in missing protein prediction from proteomics data. J Bioinform Comput Biol 2019; 17:1950013. [DOI: 10.1142/s0219720019500136] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Functional Class Scoring (FCS) is a network-based approach previously demonstrated to be powerful in missing protein prediction (MPP). We update its performance evaluation using data derived from new proteomics technology (SWATH) and also checked for reproducibility using two independent datasets profiling kidney tissue proteome. We also evaluated the objectivity of the FCS p-value, and followed up on the value of MPP from predicted complexes. Our results suggest that (1) FCS [Formula: see text]-values are non-objective, and are confounded strongly by complex size, (2) best recovery performance do not necessarily lie at standard [Formula: see text]-value cutoffs, (3) while predicted complexes may be used for augmenting MPP, they are inferior to real complexes, and are further confounded by issues relating to network coverage and quality and (4) moderate sized complexes of size 5 to 10 still exhibit considerable instability, we find that FCS works best with big complexes. While FCS is a powerful approach, blind reliance on its non-objective [Formula: see text]-value is ill-advised.
Collapse
Affiliation(s)
- Yaxing Zhao
- School of Pharmaceutical Science and Technology, Tianjin University, No. 92, Weijin Road, 30072 Tianjin, P. R. China
| | - Andrew Chi-Hau Sue
- School of Pharmaceutical Science and Technology, Tianjin University, No. 92, Weijin Road, 30072 Tianjin, P. R. China
| | - Wilson Wen Bin Goh
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, 637551, Singapore
| |
Collapse
|
15
|
Boyle EA, Pritchard JK, Greenleaf WJ. High-resolution mapping of cancer cell networks using co-functional interactions. Mol Syst Biol 2018; 14:e8594. [PMID: 30573688 PMCID: PMC6300813 DOI: 10.15252/msb.20188594] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2018] [Revised: 11/26/2018] [Accepted: 11/30/2018] [Indexed: 12/26/2022] Open
Abstract
Powerful new technologies for perturbing genetic elements have recently expanded the study of genetic interactions in model systems ranging from yeast to human cell lines. However, technical artifacts can confound signal across genetic screens and limit the immense potential of parallel screening approaches. To address this problem, we devised a novel PCA-based method for correcting genome-wide screening data, bolstering the sensitivity and specificity of detection for genetic interactions. Applying this strategy to a set of 436 whole genome CRISPR screens, we report more than 1.5 million pairs of correlated "co-functional" genes that provide finer-scale information about cell compartments, biological pathways, and protein complexes than traditional gene sets. Lastly, we employed a gene community detection approach to implicate core genes for cancer growth and compress signal from functionally related genes in the same community into a single score. This work establishes new algorithms for probing cancer cell networks and motivates the acquisition of further CRISPR screen data across diverse genotypes and cell types to further resolve complex cellular processes.
Collapse
Affiliation(s)
- Evan A Boyle
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Jonathan K Pritchard
- Department of Genetics, Stanford University, Stanford, CA, USA
- Department of Biology, Stanford University, Stanford, CA, USA
- Howard Hughes Medical Institute, Stanford, CA, USA
| | - William J Greenleaf
- Department of Genetics, Stanford University, Stanford, CA, USA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| |
Collapse
|
16
|
Jiao N, Qi Y, Lv C, Li H, Yang F. Identification of protein complexes associated with myocardial infarction using a bioinformatics approach. Mol Med Rep 2018; 18:3569-3576. [PMID: 30132549 PMCID: PMC6131540 DOI: 10.3892/mmr.2018.9414] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2016] [Accepted: 01/03/2018] [Indexed: 11/16/2022] Open
Abstract
Myocardial infarction (MI) is a leading cause of mortality and disability worldwide. Determination of the molecular mechanisms underlying the disease is crucial for identifying possible therapeutic targets and designing effective treatments. On the basis that MI may be caused by dysfunctional protein complexes rather than single genes, the present study aimed to use a bioinformatics approach to identifying complexes that may serve important roles in the development of MI. By investigating the proteins involved in these identified complexes, numerous proteins have been reported that are related to MI, whereas other proteins interacted with MI-related proteins, which implied that these protein complexes may indeed be related to the development of MI. The protein complexes detected in the present study may aid in our understanding of the molecular mechanisms that underlie MI pathogenesis.
Collapse
Affiliation(s)
- Nianhui Jiao
- Intensive Care Unit, Laiwu People's Hospital, Laiwu, Shandong 271199, P.R. China
| | - Yongjie Qi
- Intensive Care Unit, Laiwu People's Hospital, Laiwu, Shandong 271199, P.R. China
| | - Changli Lv
- Emergency Department, Laiwu People's Hospital, Laiwu, Shandong 271199, P.R. China
| | - Hongjun Li
- Emergency Department, The Central Hospital of Tai'an, Tai'an, Shandong 271000, P.R. China
| | - Fengyong Yang
- Intensive Care Unit, Laiwu People's Hospital, Laiwu, Shandong 271199, P.R. China
| |
Collapse
|
17
|
Genome-wide predicting disease-related protein complexes by walking on the heterogeneous network based on data integration and laplacian normalization. Comput Biol Chem 2017; 69:41-47. [DOI: 10.1016/j.compbiolchem.2017.04.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2016] [Revised: 04/08/2017] [Accepted: 04/12/2017] [Indexed: 11/20/2022]
|
18
|
Abstract
Protein complex-based feature selection (PCBFS) provides unparalleled reproducibility with high phenotypic relevance on proteomics data. Currently, there are five PCBFS paradigms, but not all representative methods have been implemented or made readily available. To allow general users to take advantage of these methods, we developed the R-package NetProt, which provides implementations of representative feature-selection methods. NetProt also provides methods for generating simulated differential data and generating pseudocomplexes for complex-based performance benchmarking. The NetProt open source R package is available for download from https://github.com/gohwils/NetProt/releases/ , and online documentation is available at http://rpubs.com/gohwils/204259 .
Collapse
Affiliation(s)
- Wilson Wen Bin Goh
- School of Pharmaceutical Science and Technology, Tianjin University , 92 Weijin Road, Tianjin 300072, China.,School of Biological Sciences, Nanyang Technological University , 60 Nanyang Drive, Singapore 637551.,Department of Computer Science, National University of Singapore , 13 Computing Drive, Singapore 117417
| | - Limsoon Wong
- Department of Computer Science, National University of Singapore , 13 Computing Drive, Singapore 117417.,Department of Pathology, National University of Singapore , 5 Lower Kent Ridge Road, Singapore 119074
| |
Collapse
|
19
|
Drew K, Lee C, Huizar RL, Tu F, Borgeson B, McWhite CD, Ma Y, Wallingford JB, Marcotte EM. Integration of over 9,000 mass spectrometry experiments builds a global map of human protein complexes. Mol Syst Biol 2017; 13:932. [PMID: 28596423 PMCID: PMC5488662 DOI: 10.15252/msb.20167490] [Citation(s) in RCA: 143] [Impact Index Per Article: 20.4] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Macromolecular protein complexes carry out many of the essential functions of cells, and many genetic diseases arise from disrupting the functions of such complexes. Currently, there is great interest in defining the complete set of human protein complexes, but recent published maps lack comprehensive coverage. Here, through the synthesis of over 9,000 published mass spectrometry experiments, we present hu.MAP, the most comprehensive and accurate human protein complex map to date, containing > 4,600 total complexes, > 7,700 proteins, and > 56,000 unique interactions, including thousands of confident protein interactions not identified by the original publications. hu.MAP accurately recapitulates known complexes withheld from the learning procedure, which was optimized with the aid of a new quantitative metric (k‐cliques) for comparing sets of sets. The vast majority of complexes in our map are significantly enriched with literature annotations, and the map overall shows improved coverage of many disease‐associated proteins, as we describe in detail for ciliopathies. Using hu.MAP, we predicted and experimentally validated candidate ciliopathy disease genes in vivo in a model vertebrate, discovering CCDC138, WDR90, and KIAA1328 to be new cilia basal body/centriolar satellite proteins, and identifying ANKRD55 as a novel member of the intraflagellar transport machinery. By offering significant improvements to the accuracy and coverage of human protein complexes, hu.MAP (http://proteincomplexes.org) serves as a valuable resource for better understanding the core cellular functions of human proteins and helping to determine mechanistic foundations of human disease.
Collapse
Affiliation(s)
- Kevin Drew
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, TX, USA
| | - Chanjae Lee
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, TX, USA.,Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, USA
| | - Ryan L Huizar
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, TX, USA.,Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, USA
| | - Fan Tu
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, TX, USA.,Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, USA
| | - Blake Borgeson
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, TX, USA.,Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, USA
| | - Claire D McWhite
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, TX, USA.,Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, USA
| | - Yun Ma
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, USA.,The Otolaryngology Hospital, The First Affiliated Hospital of Sun Yat-sen University Sun Yat-sen University, Guangzhou, China
| | - John B Wallingford
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, TX, USA.,Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, USA
| | - Edward M Marcotte
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, TX, USA .,Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, USA
| |
Collapse
|
20
|
Goh WWB, Wong L. Class-paired Fuzzy SubNETs: A paired variant of the rank-based network analysis family for feature selection based on protein complexes. Proteomics 2017; 17:e1700093. [PMID: 28390171 DOI: 10.1002/pmic.201700093] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2017] [Accepted: 04/05/2017] [Indexed: 01/12/2023]
Abstract
Identifying reproducible yet relevant protein features in proteomics data is a major challenge. Analysis at the level of protein complexes can resolve this issue and we have developed a suite of feature-selection methods collectively referred to as Rank-Based Network Analysis (RBNA). RBNAs differ in their individual statistical test setup but are similar in the sense that they deploy rank-defined weights among proteins per sample. This procedure is known as gene fuzzy scoring. Currently, no RBNA exists for paired-sample scenarios where both control and test tissues originate from the same source (e.g. same patient). It is expected that paired tests, when used appropriately, are more powerful than approaches intended for unpaired samples. We report that the class-paired RBNA, PPFSNET, dominates in both simulated and real data scenarios. Moreover, for the first time, we explicitly incorporate batch-effect resistance as an additional evaluation criterion for feature-selection approaches. Batch effects are class irrelevant variations arising from different handlers or processing times, and can obfuscate analysis. We demonstrate that PPFSNET and an earlier RBNA, PFSNET, are particularly resistant against batch effects, and only select features strongly correlated with class but not batch.
Collapse
Affiliation(s)
- Wilson Wen Bin Goh
- School of Pharmaceutical Science and Technology, Tianjin University, P. R. China.,Department of Computer Science, National University of Singapore, Singapore
| | - Limsoon Wong
- Department of Computer Science, National University of Singapore, Singapore.,Department of Pathology, National University of Singapore, Singapore
| |
Collapse
|
21
|
Goh WWB, Wong L. Protein complex-based analysis is resistant to the obfuscating consequences of batch effects --- a case study in clinical proteomics. BMC Genomics 2017; 18:142. [PMID: 28361693 PMCID: PMC5374662 DOI: 10.1186/s12864-017-3490-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Background In proteomics, batch effects are technical sources of variation that confounds proper analysis, preventing effective deployment in clinical and translational research. Results Using simulated and real data, we demonstrate existing batch effect-correction methods do not always eradicate all batch effects. Worse still, they may alter data integrity, and introduce false positives. Moreover, although Principal component analysis (PCA) is commonly used for detecting batch effects. The principal components (PCs) themselves may be used as differential features, from which relevant differential proteins may be effectively traced. Batch effect are removable by identifying PCs highly correlated with batch but not class effect. However, neither PC-based nor existing batch effect-correction methods address well subtle batch effects, which are difficult to eradicate, and involve data transformation and/or projection which is error-prone. To address this, we introduce the concept of batch-effect resistant methods and demonstrate how such methods incorporating protein complexes are particularly resistant to batch effect without compromising data integrity. Conclusions Protein complex-based analyses are powerful, offering unparalleled differential protein-selection reproducibility and high prediction accuracy. We demonstrate for the first time their innate resistance against batch effects, even subtle ones. As complex-based analyses require no prior data transformation (e.g. batch-effect correction), data integrity is protected. Individual checks on top-ranked protein complexes confirm strong association with phenotype classes and not batch. Therefore, the constituent proteins of these complexes are more likely to be clinically relevant. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3490-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Wilson Wen Bin Goh
- School of Pharmaceutical Science and Technology, Tianjin University, 92 Weijin Road, Nankai District, Tianjin, 300072, People's Republic of China. .,Department of Computer Science, National University of Singapore, 13 Computing Drive, Singapore, 117417, Singapore.
| | - Limsoon Wong
- Department of Computer Science, National University of Singapore, 13 Computing Drive, Singapore, 117417, Singapore. .,Department of Pathology, National University of Singapore, Singapore, Singapore.
| |
Collapse
|
22
|
Goh WWB, Wong L. Integrating Networks and Proteomics: Moving Forward. Trends Biotechnol 2016; 34:951-959. [DOI: 10.1016/j.tibtech.2016.05.015] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2016] [Revised: 05/23/2016] [Accepted: 05/24/2016] [Indexed: 11/28/2022]
|
23
|
Yang MQ, Elnitski L. A Systems Biology Comparison of Ovarian Cancers Implicates Putative Somatic Driver Mutations through Protein-Protein Interaction Models. PLoS One 2016; 11:e0163353. [PMID: 27788148 PMCID: PMC5082879 DOI: 10.1371/journal.pone.0163353] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2016] [Accepted: 09/07/2016] [Indexed: 12/14/2022] Open
Abstract
Ovarian carcinomas can be aggressive with a high mortality rate (e.g., high-grade serous ovarian carcinomas, or HGSOCs), or indolent with much better long-term outcomes (e.g., low-malignant-potential, or LMP, serous ovarian carcinomas). By comparing LMP and HGSOC tumors, we can gain insight into the mechanisms underlying malignant progression in ovarian cancer. However, previous studies of the two subtypes have been focused on gene expression analysis. Here, we applied a systems biology approach, integrating gene expression profiles derived from two independent data sets containing both LMP and HGSOC tumors with protein-protein interaction data. Genes and related networks implicated by both data sets involved both known and novel disease mechanisms and highlighted the different roles of BRCA1 and CREBBP in the two tumor types. In addition, the incorporation of somatic mutation data revealed that amplification of PAK4 is associated with poor survival in patients with HGSOC. Thus, perturbations in protein interaction networks demonstrate differential trafficking of network information between malignant and benign ovarian cancers. The novel network-based molecular signatures identified here may be used to identify new targets for intervention and to improve the treatment of invasive ovarian cancer as well as early diagnosis.
Collapse
Affiliation(s)
- Mary Qu Yang
- MidSouth Bioinformatics Center and Joint Bioinformatics Ph.D. Program, University of Arkansas at Little Rock and University of Arkansas for Medical Sciences, 2801 S. University Avenue, Little Rock, Arkansas, 72204, United States of America
- * E-mail: (MQY); (LE)
| | - Laura Elnitski
- National Human Genome Research Institute, National Institutes of Health, Rockville, MD, 20852, United States of America
- * E-mail: (MQY); (LE)
| |
Collapse
|
24
|
He FQ, Ollert M. Network-Guided Key Gene Discovery for a Given Cellular Process. ADVANCES IN BIOCHEMICAL ENGINEERING/BIOTECHNOLOGY 2016. [PMID: 27783134 DOI: 10.1007/10_2016_39] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/10/2022]
Abstract
Identification of key genes for a given physiological or pathological process is an essential but still very challenging task for the entire biomedical research community. Statistics-based approaches, such as genome-wide association study (GWAS)- or quantitative trait locus (QTL)-related analysis have already made enormous contributions to identifying key genes associated with a given disease or phenotype, the success of which is however very much dependent on a huge number of samples. Recent advances in network biology, especially network inference directly from genome-scale data and the following-up network analysis, opens up new avenues to predict key genes driving a given biological process or cellular function. Here we review and compare the current approaches in predicting key genes, which have no chances to stand out by classic differential expression analysis, from gene-regulatory, protein-protein interaction, or gene expression correlation networks. We elaborate these network-based approaches mainly in the context of immunology and infection, and urge more usage of correlation network-based predictions. Such network-based key gene discovery approaches driven by information-enriched 'omics' data should be very useful for systematic key gene discoveries for any given biochemical process or cellular function, and also valuable for novel drug target discovery and novel diagnostic, prognostic and therapeutic-efficiency marker prediction for a specific disease or disorder.
Collapse
Affiliation(s)
- Feng Q He
- Department of Infection and Immunity, Group of Immune Systems Biology, Luxembourg Institute of Health, 29, rue Henri Koch, 4354, Esch-sur-Alzette, Luxembourg.
| | - Markus Ollert
- Department of Infection and Immunity, Group of Allergy and Clinical Immunology, Luxembourg Institute of Health, 29, rue Henri Koch, 4354, Esch-sur-Alzette, Luxembourg
- Odense Research Center for Anaphylaxis, Department of Dermatology and Allergy Center, Odense University Hospital, University of Southern Denmark, 5000, Odense C, Denmark
| |
Collapse
|
25
|
Goh WWB, Wong L. Advancing Clinical Proteomics via Analysis Based on Biological Complexes: A Tale of Five Paradigms. J Proteome Res 2016; 15:3167-79. [DOI: 10.1021/acs.jproteome.6b00402] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Affiliation(s)
- Wilson Wen Bin Goh
- School
of Pharmaceutical Science and Technology, Tianjin University, 92 Weijin Road, Nankai District, Tianjin 300072, China
- Department
of Computer Science, National University of Singapore, 13 Computing
Drive, Singapore 117417
| | - Limsoon Wong
- Department
of Computer Science, National University of Singapore, 13 Computing
Drive, Singapore 117417
- Department
of Pathology, National University of Singapore, 5 Lower Kent Ridge Road, Singapore 117417
| |
Collapse
|
26
|
Chen JX, Cipriani PG, Mecenas D, Polanowska J, Piano F, Gunsalus KC, Selbach M. In Vivo Interaction Proteomics in Caenorhabditis elegans Embryos Provides New Insights into P Granule Dynamics. Mol Cell Proteomics 2016; 15:1642-57. [PMID: 26912668 PMCID: PMC4858945 DOI: 10.1074/mcp.m115.053975] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2015] [Revised: 02/24/2016] [Indexed: 01/20/2023] Open
Abstract
Studying protein interactions in whole organisms is fundamental to understanding development. Here, we combine in vivo expressed GFP-tagged proteins with quantitative proteomics to identify protein-protein interactions of selected key proteins involved in early C. elegans embryogenesis. Co-affinity purification of interaction partners for eight bait proteins resulted in a pilot in vivo interaction map of proteins with a focus on early development. Our network reflects known biology and is highly enriched in functionally relevant interactions. To demonstrate the utility of the map, we looked for new regulators of P granule dynamics and found that GEI-12, a novel binding partner of the DYRK family kinase MBK-2, is a key regulator of P granule formation and germline maintenance. Our data corroborate a recently proposed model in which the phosphorylation state of GEI-12 controls P granule dynamics. In addition, we find that GEI-12 also induces granule formation in mammalian cells, suggesting a common regulatory mechanism in worms and humans. Our results show that in vivo interaction proteomics provides unique insights into animal development.
Collapse
Affiliation(s)
- Jia-Xuan Chen
- From the ‡Max Delbrück Center for Molecular Medicine, D-13092 Berlin, Germany; §Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY 10003
| | - Patricia G Cipriani
- §Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY 10003; ¶New York University Abu Dhabi, Abu Dhabi, United Arab Emirates
| | - Desirea Mecenas
- §Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY 10003
| | - Jolanta Polanowska
- §Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY 10003; ‖INSERM, U1104, 13288 Marseille, France
| | - Fabio Piano
- §Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY 10003; ¶New York University Abu Dhabi, Abu Dhabi, United Arab Emirates
| | - Kristin C Gunsalus
- §Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY 10003; ¶New York University Abu Dhabi, Abu Dhabi, United Arab Emirates;
| | - Matthias Selbach
- From the ‡Max Delbrück Center for Molecular Medicine, D-13092 Berlin, Germany; **Charité-Universitätsmedizin Berlin, 10117 Berlin, Germany.
| |
Collapse
|
27
|
Using contrast patterns between true complexes and random subgraphs in PPI networks to predict unknown protein complexes. Sci Rep 2016; 6:21223. [PMID: 26868667 PMCID: PMC4751475 DOI: 10.1038/srep21223] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2015] [Accepted: 01/19/2016] [Indexed: 02/02/2023] Open
Abstract
Most protein complex detection methods utilize unsupervised techniques to cluster densely connected nodes in a protein-protein interaction (PPI) network, in spite of the fact that many true complexes are not dense subgraphs. Supervised methods have been proposed recently, but they do not answer why a group of proteins are predicted as a complex, and they have not investigated how to detect new complexes of one species by training the model on the PPI data of another species. We propose a novel supervised method to address these issues. The key idea is to discover emerging patterns (EPs), a type of contrast pattern, which can clearly distinguish true complexes from random subgraphs in a PPI network. An integrative score of EPs is defined to measure how likely a subgraph of proteins can form a complex. New complexes thus can grow from our seed proteins by iteratively updating this score. The performance of our method is tested on eight benchmark PPI datasets and compared with seven unsupervised methods, two supervised and one semi-supervised methods under five standards to assess the quality of the predicted complexes. The results show that in most cases our method achieved a better performance, sometimes significantly.
Collapse
|
28
|
Rizzetto S, Priami C, Csikász-Nagy A. Qualitative and Quantitative Protein Complex Prediction Through Proteome-Wide Simulations. PLoS Comput Biol 2015; 11:e1004424. [PMID: 26492574 PMCID: PMC4619657 DOI: 10.1371/journal.pcbi.1004424] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2014] [Accepted: 06/22/2015] [Indexed: 12/18/2022] Open
Abstract
Despite recent progress in proteomics most protein complexes are still unknown. Identification of these complexes will help us understand cellular regulatory mechanisms and support development of new drugs. Therefore it is really important to establish detailed information about the composition and the abundance of protein complexes but existing algorithms can only give qualitative predictions. Herein, we propose a new approach based on stochastic simulations of protein complex formation that integrates multi-source data--such as protein abundances, domain-domain interactions and functional annotations--to predict alternative forms of protein complexes together with their abundances. This method, called SiComPre (Simulation based Complex Prediction), achieves better qualitative prediction of yeast and human protein complexes than existing methods and is the first to predict protein complex abundances. Furthermore, we show that SiComPre can be used to predict complexome changes upon drug treatment with the example of bortezomib. SiComPre is the first method to produce quantitative predictions on the abundance of molecular complexes while performing the best qualitative predictions. With new data on tissue specific protein complexes becoming available SiComPre will be able to predict qualitative and quantitative differences in the complexome in various tissue types and under various conditions.
Collapse
Affiliation(s)
- Simone Rizzetto
- The Microsoft Research-University of Trento Centre for Computational Systems Biology, Rovereto, Italy
| | - Corrado Priami
- The Microsoft Research-University of Trento Centre for Computational Systems Biology, Rovereto, Italy
- Department of Mathematics, University of Trento, Povo (TN), Italy
- * E-mail: (CP); (ACN)
| | - Attila Csikász-Nagy
- Department of Computational Biology, Research and Innovation Centre, Fondazione Edmund Mach, San Michele all'Adige, Italy
- Randall Division of Cell and Molecular Biophysics and Institute for Mathematical and Molecular Biomedicine, King's College London, London, United Kingdom
- * E-mail: (CP); (ACN)
| |
Collapse
|
29
|
Chen B, Li M, Wang J, Shang X, Wu FX. A fast and high performance multiple data integration algorithm for identifying human disease genes. BMC Med Genomics 2015; 8 Suppl 3:S2. [PMID: 26399620 PMCID: PMC4582601 DOI: 10.1186/1755-8794-8-s3-s2] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND Integrating multiple data sources is indispensable in improving disease gene identification. It is not only due to the fact that disease genes associated with similar genetic diseases tend to lie close with each other in various biological networks, but also due to the fact that gene-disease associations are complex. Although various algorithms have been proposed to identify disease genes, their prediction performances and the computational time still should be further improved. RESULTS In this study, we propose a fast and high performance multiple data integration algorithm for identifying human disease genes. A posterior probability of each candidate gene associated with individual diseases is calculated by using a Bayesian analysis method and a binary logistic regression model. Two prior probability estimation strategies and two feature vector construction methods are developed to test the performance of the proposed algorithm. CONCLUSIONS The proposed algorithm is not only generated predictions with high AUC scores, but also runs very fast. When only a single PPI network is employed, the AUC score is 0.769 by using F2 as feature vectors. The average running time for each leave-one-out experiment is only around 1.5 seconds. When three biological networks are integrated, the AUC score using F3 as feature vectors increases to 0.830, and the average running time for each leave-one-out experiment takes only about 12.54 seconds. It is better than many existing algorithms.
Collapse
Affiliation(s)
- Bolin Chen
- School of Computer Science, Northwestern Polytechnical University, 127 Youyi West Road, 710072, Xi'an, P.R. China
| | - Min Li
- School of Information Science and Engineering, Central South University, 410083, Changsha, P.R.China
| | - Jianxin Wang
- School of Information Science and Engineering, Central South University, 410083, Changsha, P.R.China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, 127 Youyi West Road, 710072, Xi'an, P.R. China
| | - Fang-Xiang Wu
- Division of Biomedical Engineering, University of Saskatchewan, 57 Campus Dr., S7N 5A9, Saskatoon, Canada
- Department of Mechanical Engineering, University of Saskatchewan, 57 Campus Dr., S7N 5A9, Saskatoon, Canada
| |
Collapse
|
30
|
Carroll AJ, Zhang P, Whitehead L, Kaines S, Tcherkez G, Badger MR. PhenoMeter: A Metabolome Database Search Tool Using Statistical Similarity Matching of Metabolic Phenotypes for High-Confidence Detection of Functional Links. Front Bioeng Biotechnol 2015; 3:106. [PMID: 26284240 PMCID: PMC4518198 DOI: 10.3389/fbioe.2015.00106] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2015] [Accepted: 07/10/2015] [Indexed: 12/14/2022] Open
Abstract
This article describes PhenoMeter (PM), a new type of metabolomics database search that accepts metabolite response patterns as queries and searches the MetaPhen database of reference patterns for responses that are statistically significantly similar or inverse for the purposes of detecting functional links. To identify a similarity measure that would detect functional links as reliably as possible, we compared the performance of four statistics in correctly top-matching metabolic phenotypes of Arabidopsis thaliana metabolism mutants affected in different steps of the photorespiration metabolic pathway to reference phenotypes of mutants affected in the same enzymes by independent mutations. The best performing statistic, the PM score, was a function of both Pearson correlation and Fisher's Exact Test of directional overlap. This statistic outperformed Pearson correlation, biweight midcorrelation and Fisher's Exact Test used alone. To demonstrate general applicability, we show that the PM reliably retrieved the most closely functionally linked response in the database when queried with responses to a wide variety of environmental and genetic perturbations. Attempts to match metabolic phenotypes between independent studies were met with varying success and possible reasons for this are discussed. Overall, our results suggest that integration of pattern-based search tools into metabolomics databases will aid functional annotation of newly recorded metabolic phenotypes analogously to the way sequence similarity search algorithms have aided the functional annotation of genes and proteins. PM is freely available at MetabolomeExpress (https://www.metabolome-express.org/phenometer.php).
Collapse
Affiliation(s)
- Adam J. Carroll
- College of Medicine, Biology and Environment, Research School of Biology, The Australian National University, Canberra, ACT, Australia
| | - Peng Zhang
- College of Medicine, Biology and Environment, Research School of Biology, The Australian National University, Canberra, ACT, Australia
| | - Lynne Whitehead
- College of Medicine, Biology and Environment, Research School of Biology, The Australian National University, Canberra, ACT, Australia
| | - Sarah Kaines
- College of Medicine, Biology and Environment, Research School of Biology, The Australian National University, Canberra, ACT, Australia
| | - Guillaume Tcherkez
- College of Medicine, Biology and Environment, Research School of Biology, The Australian National University, Canberra, ACT, Australia
| | - Murray R. Badger
- College of Medicine, Biology and Environment, Research School of Biology, The Australian National University, Canberra, ACT, Australia
| |
Collapse
|
31
|
Luo J, Qi Y. Identification of Essential Proteins Based on a New Combination of Local Interaction Density and Protein Complexes. PLoS One 2015; 10:e0131418. [PMID: 26125187 PMCID: PMC4488326 DOI: 10.1371/journal.pone.0131418] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2015] [Accepted: 06/02/2015] [Indexed: 11/18/2022] Open
Abstract
Background Computational approaches aided by computer science have been used to predict essential proteins and are faster than expensive, time-consuming, laborious experimental approaches. However, the performance of such approaches is still poor, making practical applications of computational approaches difficult in some fields. Hence, the development of more suitable and efficient computing methods is necessary for identification of essential proteins. Method In this paper, we propose a new method for predicting essential proteins in a protein interaction network, local interaction density combined with protein complexes (LIDC), based on statistical analyses of essential proteins and protein complexes. First, we introduce a new local topological centrality, local interaction density (LID), of the yeast PPI network; second, we discuss a new integration strategy for multiple bioinformatics. The LIDC method was then developed through a combination of LID and protein complex information based on our new integration strategy. The purpose of LIDC is discovery of important features of essential proteins with their neighbors in real protein complexes, thereby improving the efficiency of identification. Results Experimental results based on three different PPI(protein-protein interaction) networks of Saccharomyces cerevisiae and Escherichia coli showed that LIDC outperformed classical topological centrality measures and some recent combinational methods. Moreover, when predicting MIPS datasets, the better improvement of performance obtained by LIDC is over all nine reference methods (i.e., DC, BC, NC, LID, PeC, CoEWC, WDC, ION, and UC). Conclusions LIDC is more effective for the prediction of essential proteins than other recently developed methods.
Collapse
Affiliation(s)
- Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
- * E-mail:
| | - Yi Qi
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| |
Collapse
|
32
|
Pällmann N, Braig M, Sievert H, Preukschas M, Hermans-Borgmeyer I, Schweizer M, Nagel CH, Neumann M, Wild P, Haralambieva E, Hagel C, Bokemeyer C, Hauber J, Balabanov S. Biological Relevance and Therapeutic Potential of the Hypusine Modification System. J Biol Chem 2015; 290:18343-60. [PMID: 26037925 DOI: 10.1074/jbc.m115.664490] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2015] [Indexed: 11/06/2022] Open
Abstract
Hypusine modification of the eukaryotic initiation factor 5A (eIF-5A) is emerging as a crucial regulator in cancer, infections, and inflammation. Although its contribution in translational regulation of proline repeat-rich proteins has been sufficiently demonstrated, its biological role in higher eukaryotes remains poorly understood. To establish the hypusine modification system as a novel platform for therapeutic strategies, we aimed to investigate its functional relevance in mammals by generating and using a range of new knock-out mouse models for the hypusine-modifying enzymes deoxyhypusine synthase and deoxyhypusine hydroxylase as well as for the cancer-related isoform eIF-5A2. We discovered that homozygous depletion of deoxyhypusine synthase and/or deoxyhypusine hydroxylase causes lethality in adult mice with different penetrance compared with haploinsufficiency. Network-based bioinformatic analysis of proline repeat-rich proteins, which are putative eIF-5A targets, revealed that these proteins are organized in highly connected protein-protein interaction networks. Hypusine-dependent translational control of essential proteins (hubs) and protein complexes inside these networks might explain the lethal phenotype observed after deletion of hypusine-modifying enzymes. Remarkably, our results also demonstrate that the cancer-associated isoform eIF-5A2 is dispensable for normal development and viability. Together, our results provide the first genetic evidence that the hypusine modification in eIF-5A is crucial for homeostasis in mammals. Moreover, these findings highlight functional diversity of the hypusine system compared with lower eukaryotes and indicate eIF-5A2 as a valuable and safe target for therapeutic intervention in cancer.
Collapse
Affiliation(s)
- Nora Pällmann
- From the Department of Oncology, Hematology and Bone Marrow Transplantation with Section Pneumology, Hubertus Wald Tumor Center, the Heinrich Pette Institute, Leibniz Institute for Experimental Virology, 20251 Hamburg, Germany
| | - Melanie Braig
- From the Department of Oncology, Hematology and Bone Marrow Transplantation with Section Pneumology, Hubertus Wald Tumor Center, the Division of Hematology and
| | - Henning Sievert
- From the Department of Oncology, Hematology and Bone Marrow Transplantation with Section Pneumology, Hubertus Wald Tumor Center
| | - Michael Preukschas
- From the Department of Oncology, Hematology and Bone Marrow Transplantation with Section Pneumology, Hubertus Wald Tumor Center, the Department of Molecular Pathology, Institute for Hematopathology, 22547 Hamburg, Germany
| | | | | | - Claus Henning Nagel
- the Heinrich Pette Institute, Leibniz Institute for Experimental Virology, 20251 Hamburg, Germany
| | - Melanie Neumann
- Institute of Neuropathology, University Medical Center Hamburg-Eppendorf, 20246 Hamburg, Germany
| | - Peter Wild
- Institute of Surgical Pathology, University Hospital Zurich, 8091 Zurich, Switzerland
| | - Eugenia Haralambieva
- Institute of Surgical Pathology, University Hospital Zurich, 8091 Zurich, Switzerland
| | - Christian Hagel
- Institute of Neuropathology, University Medical Center Hamburg-Eppendorf, 20246 Hamburg, Germany
| | - Carsten Bokemeyer
- From the Department of Oncology, Hematology and Bone Marrow Transplantation with Section Pneumology, Hubertus Wald Tumor Center
| | - Joachim Hauber
- the Heinrich Pette Institute, Leibniz Institute for Experimental Virology, 20251 Hamburg, Germany
| | - Stefan Balabanov
- From the Department of Oncology, Hematology and Bone Marrow Transplantation with Section Pneumology, Hubertus Wald Tumor Center, the Division of Hematology and
| |
Collapse
|
33
|
Le DH. A novel method for identifying disease associated protein complexes based on functional similarity protein complex networks. Algorithms Mol Biol 2015; 10:14. [PMID: 25969691 PMCID: PMC4427953 DOI: 10.1186/s13015-015-0044-6] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2014] [Accepted: 04/01/2015] [Indexed: 12/21/2022] Open
Abstract
Background Protein complexes formed by non-covalent interaction among proteins play important roles in cellular functions. Computational and purification methods have been used to identify many protein complexes and their cellular functions. However, their roles in terms of causing disease have not been well discovered yet. There exist only a few studies for the identification of disease-associated protein complexes. However, they mostly utilize complicated heterogeneous networks which are constructed based on an out-of-date database of phenotype similarity network collected from literature. In addition, they only apply for diseases for which tissue-specific data exist. Methods In this study, we propose a method to identify novel disease-protein complex associations. First, we introduce a framework to construct functional similarity protein complex networks where two protein complexes are functionally connected by either shared protein elements, shared annotating GO terms or based on protein interactions between elements in each protein complex. Second, we propose a simple but effective neighborhood-based algorithm, which yields a local similarity measure, to rank disease candidate protein complexes. Results Comparing the predictive performance of our proposed algorithm with that of two state-of-the-art network propagation algorithms including one we used in our previous study, we found that it performed statistically significantly better than that of these two algorithms for all the constructed functional similarity protein complex networks. In addition, it ran about 32 times faster than these two algorithms. Moreover, our proposed method always achieved high performance in terms of AUC values irrespective of the ways to construct the functional similarity protein complex networks and the used algorithms. The performance of our method was also higher than that reported in some existing methods which were based on complicated heterogeneous networks. Finally, we also tested our method with prostate cancer and selected the top 100 highly ranked candidate protein complexes. Interestingly, 69 of them were evidenced since at least one of their protein elements are known to be associated with prostate cancer. Conclusions Our proposed method, including the framework to construct functional similarity protein complex networks and the neighborhood-based algorithm on these networks, could be used for identification of novel disease-protein complex associations. Electronic supplementary material The online version of this article (doi:10.1186/s13015-015-0044-6) contains supplementary material, which is available to authorized users.
Collapse
|
34
|
Malty RH, Jessulat M, Jin K, Musso G, Vlasblom J, Phanse S, Zhang Z, Babu M. Mitochondrial targets for pharmacological intervention in human disease. J Proteome Res 2014; 14:5-21. [PMID: 25367773 PMCID: PMC4286170 DOI: 10.1021/pr500813f] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
![]()
Over the past several years, mitochondrial
dysfunction has been
linked to an increasing number of human illnesses, making mitochondrial
proteins (MPs) an ever more appealing target for therapeutic intervention.
With 20% of the mitochondrial proteome (312 of an estimated 1500 MPs)
having known interactions with small molecules, MPs appear to be highly
targetable. Yet, despite these targeted proteins functioning in a
range of biological processes (including induction of apoptosis, calcium
homeostasis, and metabolism), very few of the compounds targeting
MPs find clinical use. Recent work has greatly expanded the number
of proteins known to localize to the mitochondria and has generated
a considerable increase in MP 3D structures available in public databases,
allowing experimental screening and in silico prediction of mitochondrial
drug targets on an unprecedented scale. Here, we summarize the current
literature on clinically active drugs that target MPs, with a focus
on how existing drug targets are distributed across biochemical pathways
and organelle substructures. Also, we examine current strategies for
mitochondrial drug discovery, focusing on genetic, proteomic, and
chemogenomic assays, and relevant model systems. As cell models and
screening techniques improve, MPs appear poised to emerge as relevant
targets for a wide range of complex human diseases, an eventuality
that can be expedited through systematic analysis of MP function.
Collapse
Affiliation(s)
- Ramy H Malty
- Department of Biochemistry, Research and Innovation Centre, University of Regina , Regina, Saskatchewan S4S 0A2, Canada
| | | | | | | | | | | | | | | |
Collapse
|
35
|
Garcia-Alonso L, Jiménez-Almazán J, Carbonell-Caballero J, Vela-Boza A, Santoyo-López J, Antiñolo G, Dopazo J. The role of the interactome in the maintenance of deleterious variability in human populations. Mol Syst Biol 2014; 10:752. [PMID: 25261458 PMCID: PMC4299661 DOI: 10.15252/msb.20145222] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2014] [Revised: 08/23/2014] [Accepted: 08/28/2014] [Indexed: 12/25/2022] Open
Abstract
Recent genomic projects have revealed the existence of an unexpectedly large amount of deleterious variability in the human genome. Several hypotheses have been proposed to explain such an apparently high mutational load. However, the mechanisms by which deleterious mutations in some genes cause a pathological effect but are apparently innocuous in other genes remain largely unknown. This study searched for deleterious variants in the 1,000 genomes populations, as well as in a newly sequenced population of 252 healthy Spanish individuals. In addition, variants causative of monogenic diseases and somatic variants from 41 chronic lymphocytic leukaemia patients were analysed. The deleterious variants found were analysed in the context of the interactome to understand the role of network topology in the maintenance of the observed mutational load. Our results suggest that one of the mechanisms whereby the effect of these deleterious variants on the phenotype is suppressed could be related to the configuration of the protein interaction network. Most of the deleterious variants observed in healthy individuals are concentrated in peripheral regions of the interactome, in combinations that preserve their connectivity, and have a marginal effect on interactome integrity. On the contrary, likely pathogenic cancer somatic deleterious variants tend to occur in internal regions of the interactome, often with associated structural consequences. Finally, variants causative of monogenic diseases seem to occupy an intermediate position. Our observations suggest that the real pathological potential of a variant might be more a systems property rather than an intrinsic property of individual proteins.
Collapse
Affiliation(s)
- Luz Garcia-Alonso
- Computational Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain
| | - Jorge Jiménez-Almazán
- Computational Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain Bioinformatics of Rare Diseases (BIER), CIBER de Enfermedades Raras (CIBERER), Valencia, Spain
| | - Jose Carbonell-Caballero
- Computational Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain
| | - Alicia Vela-Boza
- Medical Genome Project, Genomics and Bioinformatics Platform of Andalusia (GBPA), Seville, Spain
| | - Javier Santoyo-López
- Medical Genome Project, Genomics and Bioinformatics Platform of Andalusia (GBPA), Seville, Spain
| | - Guillermo Antiñolo
- Medical Genome Project, Genomics and Bioinformatics Platform of Andalusia (GBPA), Seville, Spain Department of Genetics, Reproduction and Fetal Medicine, Institute of Biomedicine of Seville, University Hospital Virgen del Rocio/Consejo Superior de Investigaciones Científicas/University of Seville, Seville, Spain Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), Seville, Spain
| | - Joaquin Dopazo
- Computational Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain Bioinformatics of Rare Diseases (BIER), CIBER de Enfermedades Raras (CIBERER), Valencia, Spain Medical Genome Project, Genomics and Bioinformatics Platform of Andalusia (GBPA), Seville, Spain Functional Genomics Node, (INB) at CIPF, Valencia, Spain
| |
Collapse
|
36
|
Das J, Lee HR, Sagar A, Fragoza R, Liang J, Wei X, Wang X, Mort M, Stenson PD, Cooper DN, Yu H. Elucidating common structural features of human pathogenic variations using large-scale atomic-resolution protein networks. Hum Mutat 2014; 35:585-93. [PMID: 24599843 DOI: 10.1002/humu.22534] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2013] [Accepted: 02/14/2014] [Indexed: 01/24/2023]
Abstract
With the rapid growth of structural genomics, numerous protein crystal structures have become available. However, the parallel increase in knowledge of the functional principles underlying biological processes, and more specifically the underlying molecular mechanisms of disease, has been less dramatic. This notwithstanding, the study of complex cellular networks has made possible the inference of protein functions on a large scale. Here, we combine the scale of network systems biology with the resolution of traditional structural biology to generate a large-scale atomic-resolution interactome-network comprising 3,398 interactions between 2,890 proteins with a well-defined interaction interface and interface residues for each interaction. Within the framework of this atomic-resolution network, we have explored the structural principles underlying variations causing human-inherited disease. We find that in-frame pathogenic variations are enriched at both the interface and in the interacting domain, suggesting that variations not only at interface "hot-spots," but in the entire interacting domain can result in alterations of interactions. Further, the sites of pathogenic variations are closely related to the biophysical strength of the interactions they perturb. Finally, we show that biochemical alterations consequent to these variations are considerably more disruptive than evolutionary changes, with the most significant alterations at the protein interaction interface.
Collapse
Affiliation(s)
- Jishnu Das
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, 14853; Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, 14853
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
37
|
Ryan CJ, Krogan NJ, Cunningham P, Cagney G. All or nothing: protein complexes flip essentiality between distantly related eukaryotes. Genome Biol Evol 2013; 5:1049-59. [PMID: 23661563 PMCID: PMC3698920 DOI: 10.1093/gbe/evt074] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
In the budding yeast Saccharomyces cerevisiae, the subunits of any given protein complex are either mostly essential or mostly nonessential, suggesting that essentiality is a property of molecular machines rather than individual components. There are exceptions to this rule, however, that is, nonessential genes in largely essential complexes and essential genes in largely nonessential complexes. Here, we provide explanations for these exceptions, showing that redundancy within complexes, as revealed by genetic interactions, can explain many of the former cases, whereas “moonlighting,” as revealed by membership of multiple complexes, can explain the latter. Surprisingly, we find that redundancy within complexes cannot usually be explained by gene duplication, suggesting alternate buffering mechanisms. In the distantly related Schizosaccharomyces pombe, we observe the same phenomenon of modular essentiality, suggesting that it may be a general feature of eukaryotes. Furthermore, we show that complexes flip essentiality in a cohesive fashion between the two species, that is, they tend to change from mostly essential to mostly nonessential, or vice versa, but not to mixed patterns. We show that these flips in essentiality can be explained by differing lifestyles of the two yeasts. Collectively, our results support a previously proposed model where proteins are essential because of their involvement in essential functional modules rather than because of specific topological features such as degree or centrality.
Collapse
Affiliation(s)
- Colm J Ryan
- School of Computer Science and Informatics, University College Dublin, Ireland.
| | | | | | | |
Collapse
|
38
|
Wang YC, Deng N, Chen S, Wang Y. Computational Study of Drugs by Integrating Omics Data with Kernel Methods. Mol Inform 2013; 32:930-41. [PMID: 27481139 DOI: 10.1002/minf.201300090] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2013] [Accepted: 11/13/2013] [Indexed: 01/02/2023]
Abstract
With the rapid development of genomic and chemogenomic techniques, many omics data sources for drugs have been publicly available. These data sources illustrate drug's biological function in the living cell from different levels and different aspects. One straightforward idea is to learn understandable rules via computational models and algorithms to mine and integrate these data sources. Here, we review our recent efforts on developing kernel-based methods to integrate drug related omics data sources. Three promising applications of our framework are shown to predict drug targets, assign drug's ATC-code annotation, and reveal drug repositioning. We demonstrate that data integration does provide more information and improve the accuracy by recovering more experimentally observed target proteins, ATC-codes, and drug repositioning. Importantly, data integration can indicate novel predictions which are supported by database search and functional annotation analysis and worthy of further experimental validation. In conclusion, kernel methods can efficiently integrate heterogeneous data sources to computationally study drugs, and will promote the further research in drug discovery in a low-cost way.
Collapse
Affiliation(s)
- Yongcui C Wang
- Key Laboratory of Adaptation and Evolution of Plateau Biota, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, No. 23, Xinning Road, Xining, Qinghai Province, P. R. China
| | - Naiyang Deng
- College of Science, China Agriculture University, No. 17. Qinghua East Road, Beijing, P. R. China
| | - Shilong Chen
- Key Laboratory of Adaptation and Evolution of Plateau Biota, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, No. 23, Xinning Road, Xining, Qinghai Province, P. R. China.
| | - Yong Wang
- National Centre for Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, N0.55, Zhongguancun East Road, Beijing, P. R. China. .,Molecular Profiling Research Center for Drug Discovery, National Institute of Advanced Industrial Science and Technology, Tokyo, Japan.
| |
Collapse
|
39
|
Wang Y, Chen S, Deng N, Wang Y. Drug repositioning by kernel-based integration of molecular structure, molecular activity, and phenotype data. PLoS One 2013; 8:e78518. [PMID: 24244318 PMCID: PMC3823875 DOI: 10.1371/journal.pone.0078518] [Citation(s) in RCA: 90] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2013] [Accepted: 09/17/2013] [Indexed: 12/21/2022] Open
Abstract
Computational inference of novel therapeutic values for existing drugs, i.e., drug repositioning, offers the great prospect for faster and low-risk drug development. Previous researches have indicated that chemical structures, target proteins, and side-effects could provide rich information in drug similarity assessment and further disease similarity. However, each single data source is important in its own way and data integration holds the great promise to reposition drug more accurately. Here, we propose a new method for drug repositioning, PreDR (Predict Drug Repositioning), to integrate molecular structure, molecular activity, and phenotype data. Specifically, we characterize drug by profiling in chemical structure, target protein, and side-effects space, and define a kernel function to correlate drugs with diseases. Then we train a support vector machine (SVM) to computationally predict novel drug-disease interactions. PreDR is validated on a well-established drug-disease network with 1,933 interactions among 593 drugs and 313 diseases. By cross-validation, we find that chemical structure, drug target, and side-effects information are all predictive for drug-disease relationships. More experimentally observed drug-disease interactions can be revealed by integrating these three data sources. Comparison with existing methods demonstrates that PreDR is competitive both in accuracy and coverage. Follow-up database search and pathway analysis indicate that our new predictions are worthy of further experimental validation. Particularly several novel predictions are supported by clinical trials databases and this shows the significant prospects of PreDR in future drug treatment. In conclusion, our new method, PreDR, can serve as a useful tool in drug discovery to efficiently identify novel drug-disease interactions. In addition, our heterogeneous data integration framework can be applied to other problems.
Collapse
Affiliation(s)
- Yongcui Wang
- Key Laboratory of Adaptation and Evolution of Plateau Biota, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining, China
| | - Shilong Chen
- Key Laboratory of Adaptation and Evolution of Plateau Biota, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining, China
| | - Naiyang Deng
- College of Science, China Agricultural University, Beijing, China
| | - Yong Wang
- National Center for Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
- Molecular Profiling Research Center for Drug Discovery, National Institute of Advanced Industrial Science and Technology, Tokyo, Japan
- * E-mail:
| |
Collapse
|
40
|
Wang PI, Hwang S, Kincaid RP, Sullivan CS, Lee I, Marcotte EM. RIDDLE: reflective diffusion and local extension reveal functional associations for unannotated gene sets via proximity in a gene network. Genome Biol 2012; 13:R125. [PMID: 23268829 PMCID: PMC4056375 DOI: 10.1186/gb-2012-13-12-r125] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2012] [Accepted: 12/26/2012] [Indexed: 01/08/2023] Open
Abstract
The growing availability of large-scale functional networks has promoted the development of many successful techniques for predicting functions of genes. Here we extend these network-based principles and techniques to functionally characterize whole sets of genes. We present RIDDLE (Reflective Diffusion and Local Extension), which uses well developed guilt-by-association principles upon a human gene network to identify associations of gene sets. RIDDLE is particularly adept at characterizing sets with no annotations, a major challenge where most traditional set analyses fail. Notably, RIDDLE found microRNA-450a to be strongly implicated in ocular diseases and development. A web application is available at http://www.functionalnet.org/RIDDLE.
Collapse
|
41
|
Cheng TMK, Gulati S, Agius R, Bates PA. Understanding cancer mechanisms through network dynamics. Brief Funct Genomics 2012; 11:543-60. [PMID: 22811516 DOI: 10.1093/bfgp/els025] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/11/2024] Open
Abstract
Cancer is a complex, multifaceted disease. Cellular systems are perturbed both during the onset and development of cancer, and the behavioural change of tumour cells usually involves a broad range of dynamic variations. To an extent, the difficulty of monitoring the systemic change has been alleviated by recent developments in the high-throughput technologies. At both the genomic as well as proteomic levels, the technological advances in microarray and mass spectrometry, in conjunction with computational simulations and the construction of human interactome maps have facilitated the progress of identifying disease-associated genes. On a systems level, computational approaches developed for network analysis are becoming especially useful for providing insights into the mechanism behind tumour development and metastasis. This review emphasizes network approaches that have been developed to study cancer and provides an overview of our current knowledge of protein-protein interaction networks, and how their systemic perturbation can be analysed by two popular network simulation methods: Boolean network and ordinary differential equations.
Collapse
Affiliation(s)
- Tammy M K Cheng
- Biomolecular Modelling Laboratory, Cancer Research UK London Research Institute, Lincoln's Inn Fields, London WC2A 3LY, UK
| | | | | | | |
Collapse
|
42
|
Zeng T, Chen L. Tracing dynamic biological processes during phase transition. BMC SYSTEMS BIOLOGY 2012; 6 Suppl 1:S12. [PMID: 23046764 PMCID: PMC3403121 DOI: 10.1186/1752-0509-6-s1-s12] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Background Phase transition widely exists in the biological world, such as transformation of cell cycle phases, cell differentiation stages, disease development, and so on. Such a nonlinear phenomenon is considered as the conversion of a biological system from one phenotype/state to another. Studies on the molecular mechanisms of biological phase transition have attracted much attention, in particular, on different genotypes (or expression variations) in a specific phase, but with less of focus on cascade changes of genes' functions (or system state) during the phase shift or transition process. However, it is a fundamental but important mission to trace the temporal characteristics of a biological system during a specific phase transition process, which can offer clues for understanding dynamic behaviors of living organisms. Results By overcoming the hurdles of traditional time segmentation and temporal biclustering methods, a causal process model (CPM) in the present work is proposed to study the biological phase transition in a systematic manner, i.e. first, we make gene-specific segmentation on time-course expression data by developing a new boundary gene estimation scheme, and then infer functional cascade dynamics by constructing a temporal block network. After the computational validation on synthetic data, CPM was used to analyze the well-known Yeast cell cycle data. It was found that the dynamics of the boundary genes are periodic and consistent with the phases of the cell cycle, and the temporal block network indeed demonstrates a meaningful cascade structure of the enriched biological functions. In addition, we further studied protein modules based on the temporal block network, which reflect temporal features in different cycles. Conclusions All of these results demonstrate that CPM is effective and efficient comparing to traditional methods, and is able to elucidate essential regulatory mechanism of a biological system even with complicated nonlinear phase transitions.
Collapse
Affiliation(s)
- Tao Zeng
- Key Laboratory of Systems Biology, SIBS-Novo Nordisk Translational Research Centre for PreDiabetes, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | | |
Collapse
|
43
|
Li W, Wang R, Bai L, Yan Z, Sun Z. Cancer core modules identification through genomic and transcriptomic changes correlation detection at network level. BMC SYSTEMS BIOLOGY 2012; 6:64. [PMID: 22691569 PMCID: PMC3443057 DOI: 10.1186/1752-0509-6-64] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/06/2012] [Accepted: 06/12/2012] [Indexed: 02/04/2023]
Abstract
BACKGROUND Identification of driver mutations among numerous genomic alternations remains a critical challenge to the elucidation of the underlying mechanisms of cancer. Because driver mutations by definition are associated with a greater number of cancer phenotypes compared to other mutations, we hypothesized that driver mutations could more easily be identified once the genotype-phenotype correlations are detected across tumor samples. RESULTS In this study, we describe a novel network analysis to identify the driver mutation through integrating both cancer genomes and transcriptomes. Our method successfully identified a significant genotype-phenotype change correlation in all six solid tumor types and revealed core modules that contain both significantly enriched somatic mutations and aberrant expression changes specific to tumor development. Moreover, we found that the majority of these core modules contained well known cancer driver mutations, and that their mutated genes tended to occur at hub genes with central regulatory roles. In these mutated genes, the majority were cancer-type specific and exhibited a closer relationship within the same cancer type rather than across cancer types. The remaining mutated genes that exist in multiple cancer types led to two cancer type clusters, one cluster consisted of three neural derived or related cancer types, and the other cluster consisted of two adenoma cancer types. CONCLUSIONS Our approach can successfully identify the candidate drivers from the core modules. Comprehensive network analysis on the core modules potentially provides critical insights into convergent cancer development in different organs.
Collapse
Affiliation(s)
- Wenting Li
- MOE Key Laboratory of Bioinformatics, State Key Laboratory of Biomembrane and Membrane Biotechnology, Institute of Bioinformatics and Systems Biology, School of Life Sciences, Tsinghua University, Beijing, China
| | | | | | | | | |
Collapse
|
44
|
Zhang S, Chang Z, Li Z, DuanMu H, Li Z, Li K, Liu Y, Qiu F, Xu Y. Calculating phenotypic similarity between genes using hierarchical structure data based on semantic similarity. Gene 2012; 497:58-65. [PMID: 22305981 DOI: 10.1016/j.gene.2012.01.014] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2011] [Revised: 01/16/2012] [Accepted: 01/18/2012] [Indexed: 01/25/2023]
Abstract
Phenotypic similarity is correlated with a number of measures of gene function, such as relatedness at the level of direct protein-protein interaction. The phenotypic effect of a deleted or mutated gene, which is one part of gene annotation, has caught broad attention. However, there have been few measures to study phenotypic similarity with the data from Human Phenotype Ontology (HPO) database, therefore more analogous measures should be developed and investigated. We used five semantic similarity-based measures (Jiang and Conrath, Lin, Schlicker, Yu and Wu) to calculate the human phenotypic similarity between genes (PSG) with data from HPO database, and evaluated their accuracy with information of protein-protein interaction, protein complex, protein family, gene function or DNA sequence. Compared with the gene pairs that were random selected, the results of these methods were statistically significant (all P<0.001). Furthermore, we assessed the performance of these five measures by receiver operating characteristic (ROC) curve analysis, and found that most of them performed better than the previous methods. This work had proved that these measures based on semantic similarity for calculation of PSG were effective for hierarchical structure data. Our study contributes to the development and optimization of novel algorithms of PSG calculation and provides more alternative methods to researchers as well as tools and directions for PSG study.
Collapse
Affiliation(s)
- Shanzhen Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, PR China
| | | | | | | | | | | | | | | | | |
Collapse
|
45
|
Nacher JC, Schwartz JM. Modularity in protein complex and drug interactions reveals new polypharmacological properties. PLoS One 2012; 7:e30028. [PMID: 22279562 PMCID: PMC3261189 DOI: 10.1371/journal.pone.0030028] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2011] [Accepted: 12/12/2011] [Indexed: 11/18/2022] Open
Abstract
Recent studies have highlighted the importance of interconnectivity in a large range of molecular and human disease-related systems. Network medicine has emerged as a new paradigm to deal with complex diseases. Connections between protein complexes and key diseases have been suggested for decades. However, it was not until recently that protein complexes were identified and classified in sufficient amounts to carry out a large-scale analysis of the human protein complex system. We here present the first systematic and comprehensive set of relationships between protein complexes and associated drugs and analyzed their topological features. The network structure is characterized by a high modularity, both in the bipartite graph and in its projections, indicating that its topology is highly distinct from a random network and that it contains a rich and heterogeneous internal modular structure. To unravel the relationships between modules of protein complexes, drugs and diseases, we investigated in depth the origins of this modular structure in examples of particular diseases. This analysis unveils new associations between diseases and protein complexes and highlights the potential role of polypharmacological drugs, which target multiple cellular functions to combat complex diseases driven by gain-of-function mutations.
Collapse
Affiliation(s)
- Jose C Nacher
- Department of Complex and Intelligent Systems, Future University Hakodate, Hokkaido, Japan.
| | | |
Collapse
|
46
|
Musso G, Emili A, Zhang Z. Characterization and evolutionary analysis of protein-protein interaction networks. Methods Mol Biol 2012; 856:363-380. [PMID: 22399467 DOI: 10.1007/978-1-61779-585-5_15] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
While researchers have known the importance of the protein-protein interaction for decades, recent innovations in large-scale screening techniques have caused a shift in the paradigm of protein function analysis. Where the focus was once on the individual protein, attention is now directed to the surrounding network of protein associations. As protein interaction networks can provide useful insights into the potential function of and phenotypes associated with proteins, the increasing availability of large-scale protein interaction data suggests that molecular biologists can extract more meaningful hypotheses through examination of these large networks. Further, increasing availability of high-quality protein interaction data in multiple species has allowed interpretation of the properties of networks (i.e., the presence of hubs and modularity) from an evolutionary perspective. In this chapter, we discuss major previous findings derived from analyses of large-scale protein interaction data, focusing on approaches taken by landmark assays in evaluating the structure and evolution of these networks. We then outline basic techniques for protein interaction network analysis with the goal of pointing out the benefits and potential limitations of these approaches. As the majority of large-scale protein interaction data has been generated in budding yeast, literature described here focuses on this important model organism with references to other species included where possible.
Collapse
Affiliation(s)
- Gabriel Musso
- Cardiovascular Division, Brigham & Women's Hospital, Boston, MA, USA.
| | | | | |
Collapse
|
47
|
Landry CR, Rifkin SA. The genotype-phenotype maps of systems biology and quantitative genetics: distinct and complementary. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2012; 751:371-98. [PMID: 22821467 DOI: 10.1007/978-1-4614-3567-9_17] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
The processes by which genetic variation in complex traits is generated and maintained in populations has for a long time been treated in abstract and statistical terms. As a consequence, quantitative genetics has provided limited insights into our understanding of the molecular bases of quantitative trait variation. With the developing technological and conceptual tools of systems biology, cellular and molecular processes are being described in greater detail. While we have a good description of how signaling and other molecular networks are organized in the cell, we still do not know how genetic variation affects these pathways, because systems and molecular biology usually ignore the type and extent of genetic variation found in natural populations. Here we discuss the quantitative genetics and systems biology approaches for the study of complex trait architecture and discuss why these two disciplines would synergize with each other to answer questions that neither of the two could answer alone.
Collapse
|
48
|
Hwang S, Rhee SY, Marcotte EM, Lee I. Systematic prediction of gene function in Arabidopsis thaliana using a probabilistic functional gene network. Nat Protoc 2011; 6:1429-42. [PMID: 21886106 DOI: 10.1038/nprot.2011.372] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
AraNet is a functional gene network for the reference plant Arabidopsis and has been constructed in order to identify new genes associated with plant traits. It is highly predictive for diverse biological pathways and can be used to prioritize genes for functional screens. Moreover, AraNet provides a web-based tool with which plant biologists can efficiently discover novel functions of Arabidopsis genes (http://www.functionalnet.org/aranet/). This protocol explains how to conduct network-based prediction of gene functions using AraNet and how to interpret the prediction results. Functional discovery in plant biology is facilitated by combining candidate prioritization by AraNet with focused experimental tests.
Collapse
Affiliation(s)
- Sohyun Hwang
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea
| | | | | | | |
Collapse
|
49
|
Affiliation(s)
- Christian R Landry
- Institut de Biologie Intégrative et des Systèmes (IBIS), Département de Biologie and Regroupement stratégique sur la fonction, la structure et l'ingénierie des protéines (PROTEO), Université Laval, Québec, Québec, G1V 0A6, Canada.
| |
Collapse
|
50
|
Vidal M, Cusick ME, Barabási AL. Interactome networks and human disease. Cell 2011; 144:986-98. [PMID: 21414488 DOI: 10.1016/j.cell.2011.02.016] [Citation(s) in RCA: 1121] [Impact Index Per Article: 86.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2010] [Revised: 02/07/2011] [Accepted: 02/09/2011] [Indexed: 02/06/2023]
Abstract
Complex biological systems and cellular networks may underlie most genotype to phenotype relationships. Here, we review basic concepts in network biology, discussing different types of interactome networks and the insights that can come from analyzing them. We elaborate on why interactome networks are important to consider in biology, how they can be mapped and integrated with each other, what global properties are starting to emerge from interactome network models, and how these properties may relate to human disease.
Collapse
Affiliation(s)
- Marc Vidal
- Center for Cancer Systems Biology and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA.
| | | | | |
Collapse
|