1
|
Xu X, Qi Z, Wang L, Zhang M, Geng Z, Han X. Gsw-fi: a GLM model incorporating shrinkage and double-weighted strategies for identifying cancer driver genes with functional impact. BMC Bioinformatics 2024; 25:99. [PMID: 38448819 PMCID: PMC10916024 DOI: 10.1186/s12859-024-05707-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Accepted: 02/16/2024] [Indexed: 03/08/2024] Open
Abstract
BACKGROUND Cancer, a disease with high morbidity and mortality rates, poses a significant threat to human health. Driver genes, which harbor mutations accountable for the initiation and progression of tumors, play a crucial role in cancer development. Identifying driver genes stands as a paramount objective in cancer research and precision medicine. RESULTS In the present work, we propose a method for identifying driver genes using a Generalized Linear Regression Model (GLM) with Shrinkage and double-Weighted strategies based on Functional Impact, which is named GSW-FI. Firstly, an estimating model is proposed for assessing the background functional impacts of genes based on GLM, utilizing gene features as predictors. Secondly, the shrinkage and double-weighted strategies as two revising approaches are integrated to ensure the rationality of the identified driver genes. Lastly, a statistical method of hypothesis testing is designed to identify driver genes by leveraging the estimated background function impacts. Experimental results conducted on 31 The Cancer Genome Altas datasets demonstrate that GSW-FI outperforms ten other prediction methods in terms of the overlap fraction with well-known databases and consensus predictions among different methods. CONCLUSIONS GSW-FI presents a novel approach that efficiently identifies driver genes with functional impact mutations using computational methods, thereby advancing the development of precision medicine for cancer.
Collapse
Affiliation(s)
- Xiaolu Xu
- School of Computer and Artificial Intelligence, Liaoning Normal University, Dalian, China
| | - Zitong Qi
- Department of Statistics, University of Washington, Seattle, USA
| | - Lei Wang
- Center for Reproductive and Genetic Medicine, Dalian Women and Children's Medical Group, Dalian, China.
| | - Meiwei Zhang
- Center for Reproductive and Genetic Medicine, Dalian Women and Children's Medical Group, Dalian, China.
| | - Zhaohong Geng
- Department of Cardiology, Second Affiliated Hospital of Dalian Medical University, Dalian, China
| | - Xiumei Han
- College of Artificial Intelligence, Dalian Maritime University, Dalian, China
| |
Collapse
|
2
|
Liu J, Ma F, Zhu Y, Zhang N, Kong L, Mi J, Cong H, Gao R, Wang M, Zhang Y. MaxCLK: discovery of cancer driver genes via maximal clique and information entropy of modules. Bioinformatics 2023; 39:btad737. [PMID: 38065693 PMCID: PMC10739565 DOI: 10.1093/bioinformatics/btad737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2023] [Revised: 11/10/2023] [Accepted: 12/07/2023] [Indexed: 12/23/2023] Open
Abstract
MOTIVATION Cancer is caused by the accumulation of somatic mutations in multiple pathways, in which driver mutations are typically of the properties of high coverage and high exclusivity in patients. Identifying cancer driver genes has a pivotal role in understanding the mechanisms of oncogenesis and treatment. RESULTS Here, we introduced MaxCLK, an algorithm for identifying cancer driver genes, which was developed by an integrated analysis of somatic mutation data and protein-protein interaction (PPI) networks and further improved by an information entropy index. Tested on pancancer and single cancers, MaxCLK outperformed other existing methods with higher accuracy. About pancancer, we predicted 154 driver genes and 787 driver modules. The analysis of co-occurrence and exclusivity between modules and pathways reveals the correlation of their combinations. Overall, our study has deepened the understanding of driver mechanism in PPI topology and found novel driver genes. AVAILABILITY AND IMPLEMENTATION The source codes for MaxCLK are freely available at https://github.com/ShandongUniversityMasterMa/MaxCLK-main.
Collapse
Affiliation(s)
- Jian Liu
- School of Mathematics and Statistics, Shandong University at Weihai, Weihai, Shandong 264209, China
| | - Fubin Ma
- School of Mathematics and Statistics, Shandong University at Weihai, Weihai, Shandong 264209, China
| | - Yongdi Zhu
- School of Mathematics and Statistics, Shandong University at Weihai, Weihai, Shandong 264209, China
| | - Naiqian Zhang
- School of Mathematics and Statistics, Shandong University at Weihai, Weihai, Shandong 264209, China
| | - Lingming Kong
- Marine College, Shandong University at Weihai, Weihai, Shandong 264209, China
| | - Jia Mi
- Precision Medicine Research Center, School of Pharmacy, Binzhou Medical University, Yantai, Shandong 264003, China
| | - Haiyan Cong
- Department of Central Lab, Weihai Municipal Hospital, Weihai, Shandong 264209, China
| | - Rui Gao
- School of Control Science and Engineering, Shandong University, Jinan, Shandong 250100, China
| | - Mingyi Wang
- Department of Central Lab, Weihai Municipal Hospital, Weihai, Shandong 264209, China
- Department of Central Lab, Weihai Municipal Hospital, Cheeloo College of Medicine, Shandong University, Weihai, Shandong 264200, China
| | - Yusen Zhang
- School of Mathematics and Statistics, Shandong University at Weihai, Weihai, Shandong 264209, China
| |
Collapse
|
3
|
Patterson A, Elbasir A, Tian B, Auslander N. Computational Methods Summarizing Mutational Patterns in Cancer: Promise and Limitations for Clinical Applications. Cancers (Basel) 2023; 15:cancers15071958. [PMID: 37046619 PMCID: PMC10093138 DOI: 10.3390/cancers15071958] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 02/24/2023] [Accepted: 03/09/2023] [Indexed: 03/29/2023] Open
Abstract
Since the rise of next-generation sequencing technologies, the catalogue of mutations in cancer has been continuously expanding. To address the complexity of the cancer-genomic landscape and extract meaningful insights, numerous computational approaches have been developed over the last two decades. In this review, we survey the current leading computational methods to derive intricate mutational patterns in the context of clinical relevance. We begin with mutation signatures, explaining first how mutation signatures were developed and then examining the utility of studies using mutation signatures to correlate environmental effects on the cancer genome. Next, we examine current clinical research that employs mutation signatures and discuss the potential use cases and challenges of mutation signatures in clinical decision-making. We then examine computational studies developing tools to investigate complex patterns of mutations beyond the context of mutational signatures. We survey methods to identify cancer-driver genes, from single-driver studies to pathway and network analyses. In addition, we review methods inferring complex combinations of mutations for clinical tasks and using mutations integrated with multi-omics data to better predict cancer phenotypes. We examine the use of these tools for either discovery or prediction, including prediction of tumor origin, treatment outcomes, prognosis, and cancer typing. We further discuss the main limitations preventing widespread clinical integration of computational tools for the diagnosis and treatment of cancer. We end by proposing solutions to address these challenges using recent advances in machine learning.
Collapse
Affiliation(s)
- Andrew Patterson
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- The Wistar Institute, Philadelphia, PA 19104, USA
| | | | - Bin Tian
- The Wistar Institute, Philadelphia, PA 19104, USA
| | - Noam Auslander
- The Wistar Institute, Philadelphia, PA 19104, USA
- Department of Cancer Biology, University of Pennsylvania, Philadelphia, PA 19104, USA
- Correspondence:
| |
Collapse
|
4
|
Chen HH, Hsueh CW, Lee CH, Hao TY, Tu TY, Chang LY, Lee JC, Lin CY. SWEET: a single-sample network inference method for deciphering individual features in disease. Brief Bioinform 2023; 24:7017366. [PMID: 36719112 PMCID: PMC10025435 DOI: 10.1093/bib/bbad032] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Revised: 01/05/2023] [Accepted: 01/14/2023] [Indexed: 02/01/2023] Open
Abstract
Recently, extracting inherent biological system information (e.g. cellular networks) from genome-wide expression profiles for developing personalized diagnostic and therapeutic strategies has become increasingly important. However, accurately constructing single-sample networks (SINs) to capture individual characteristics and heterogeneity in disease remains challenging. Here, we propose a sample-specific-weighted correlation network (SWEET) method to model SINs by integrating the genome-wide sample-to-sample correlation (i.e. sample weights) with the differential network between perturbed and aggregate networks. For a group of samples, the genome-wide sample weights can be assessed without prior knowledge of intrinsic subpopulations to address the network edge number bias caused by sample size differences. Compared with the state-of-the-art SIN inference methods, the SWEET SINs in 16 cancers more likely fit the scale-free property, display higher overlap with the human interactomes and perform better in identifying three types of cancer-related genes. Moreover, integrating SWEET SINs with a network proximity measure facilitates characterizing individual features and therapy in diseases, such as somatic mutation, mut-driver and essential genes. Biological experiments further validated two candidate repurposable drugs, albendazole for head and neck squamous cell carcinoma (HNSCC) and lung adenocarcinoma (LUAD) and encorafenib for HNSCC. By applying SWEET, we also identified two possible LUAD subtypes that exhibit distinct clinical features and molecular mechanisms. Overall, the SWEET method complements current SIN inference and analysis methods and presents a view of biological systems at the network level to offer numerous clues for further investigation and clinical translation in network medicine and precision medicine.
Collapse
Affiliation(s)
- Hsin-Hua Chen
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| | - Chun-Wei Hsueh
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| | - Chia-Hwa Lee
- School of Medical Laboratory Science and Biotechnology, College of Medical Science and Technology, Taipei Medical University, Taipei 110, Taiwan
- TMU Research Center of Cancer Translational Medicine, Taipei Medical University, Taipei 110, Taiwan
- Ph.D. Program in Medical Biotechnology, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan
| | - Ting-Yi Hao
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| | - Tzu-Ying Tu
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| | - Lan-Yun Chang
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| | - Jih-Chin Lee
- Department of Otolaryngology-Head and Neck Surgery, Tri-Service General Hospital, National Defense Medical Center, Taipei 110, Taiwan
| | - Chun-Yu Lin
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
- Department of Biological Science and Technology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
- Institute of Data Science and Engineering, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
- Center for Intelligent Drug Systems and Smart Bio-devices, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
- School of Dentistry, Kaohsiung Medical University, Kaohsiung 807, Taiwan
| |
Collapse
|
5
|
Cheng X, Amanullah M, Liu W, Liu Y, Pan X, Zhang H, Xu H, Liu P, Lu Y. WMDS.net: a network control framework for identifying key players in transcriptome programs. Bioinformatics 2023; 39:7023921. [PMID: 36727489 PMCID: PMC9925106 DOI: 10.1093/bioinformatics/btad071] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Revised: 01/16/2023] [Accepted: 02/01/2023] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Mammalian cells can be transcriptionally reprogramed to other cellular phenotypes. Controllability of such complex transitions in transcriptional networks underlying cellular phenotypes is an inherent biological characteristic. This network controllability can be interpreted by operating a few key regulators to guide the transcriptional program from one state to another. Finding the key regulators in the transcriptional program can provide key insights into the network state transition underlying cellular phenotypes. RESULTS To address this challenge, here, we proposed to identify the key regulators in the transcriptional co-expression network as a minimum dominating set (MDS) of driver nodes that can fully control the network state transition. Based on the theory of structural controllability, we developed a weighted MDS network model (WMDS.net) to find the driver nodes of differential gene co-expression networks. The weight of WMDS.net integrates the degree of nodes in the network and the significance of gene co-expression difference between two physiological states into the measurement of node controllability of the transcriptional network. To confirm its validity, we applied WMDS.net to the discovery of cancer driver genes in RNA-seq datasets from The Cancer Genome Atlas. WMDS.net is powerful among various cancer datasets and outperformed the other top-tier tools with a better balance between precision and recall. AVAILABILITY AND IMPLEMENTATION https://github.com/chaofen123/WMDS.net. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xiang Cheng
- Department of Gynecologic Oncology, Women's Hospital and Institute of Translational Medicine, Zhejiang University School of Medicine, Hangzhou 310006, China.,Institute of Bioinformatics, Zhejiang University, Hangzhou 310058, China
| | - Md Amanullah
- Institute of Bioinformatics, Zhejiang University, Hangzhou 310058, China.,Department of Respiratory Medicine, Key Laboratory of Precision Medicine in Diagnosis and Monitoring Research of Zhejiang Province, Sir Run Run Shaw Hospital and Institute of Translational Medicine, Zhejiang University School of Medicine, Hangzhou 310016, China
| | - Weigang Liu
- Department of Respiratory Medicine, Key Laboratory of Precision Medicine in Diagnosis and Monitoring Research of Zhejiang Province, Sir Run Run Shaw Hospital and Institute of Translational Medicine, Zhejiang University School of Medicine, Hangzhou 310016, China
| | - Yi Liu
- Institute of Bioinformatics, Zhejiang University, Hangzhou 310058, China.,Department of Respiratory Medicine, Key Laboratory of Precision Medicine in Diagnosis and Monitoring Research of Zhejiang Province, Sir Run Run Shaw Hospital and Institute of Translational Medicine, Zhejiang University School of Medicine, Hangzhou 310016, China
| | - Xiaoqing Pan
- Department of Mathematics, Shanghai Normal University, Xuhui 200234, China
| | - Honghe Zhang
- Department of Pathology, Research Unit of Intelligence Classification of Tumor Pathology and Precision Therapy, Chinese Academy of Medical Sciences, Zhejiang University School of Medicine, Hangzhou 310058, China
| | - Haiming Xu
- Institute of Bioinformatics, Zhejiang University, Hangzhou 310058, China
| | - Pengyuan Liu
- Department of Gynecologic Oncology, Women's Hospital and Institute of Translational Medicine, Zhejiang University School of Medicine, Hangzhou 310006, China.,Department of Physiology, Center of Systems Molecular Medicine, Medical College of Wisconsin, Milwaukee, WI 53226, USA.,Cancer Center, Zhejiang University, Hangzhou 310029, China
| | - Yan Lu
- Department of Gynecologic Oncology, Women's Hospital and Institute of Translational Medicine, Zhejiang University School of Medicine, Hangzhou 310006, China.,Institute of Bioinformatics, Zhejiang University, Hangzhou 310058, China.,Cancer Center, Zhejiang University, Hangzhou 310029, China
| |
Collapse
|
6
|
Nussinov R, Zhang M, Maloney R, Liu Y, Tsai CJ, Jang H. Allostery: Allosteric Cancer Drivers and Innovative Allosteric Drugs. J Mol Biol 2022; 434:167569. [PMID: 35378118 PMCID: PMC9398924 DOI: 10.1016/j.jmb.2022.167569] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Revised: 03/11/2022] [Accepted: 03/25/2022] [Indexed: 01/12/2023]
Abstract
Here, we discuss the principles of allosteric activating mutations, propagation downstream of the signals that they prompt, and allosteric drugs, with examples from the Ras signaling network. We focus on Abl kinase where mutations shift the landscape toward the active, imatinib binding-incompetent conformation, likely resulting in the high affinity ATP outcompeting drug binding. Recent pharmacological innovation extends to allosteric inhibitor (GNF-5)-linked PROTAC, targeting Bcr-Abl1 myristoylation site, and broadly, allosteric heterobifunctional degraders that destroy targets, rather than inhibiting them. Designed chemical linkers in bifunctional degraders can connect the allosteric ligand that binds the target protein and the E3 ubiquitin ligase warhead anchor. The physical properties and favored conformational state of the engineered linker can precisely coordinate the distance and orientation between the target and the recruited E3. Allosteric PROTACs, noncompetitive molecular glues, and bitopic ligands, with covalent links of allosteric ligands and orthosteric warheads, increase the effective local concentration of productively oriented and placed ligands. Through covalent chemical or peptide linkers, allosteric drugs can collaborate with competitive drugs, degrader anchors, or other molecules of choice, driving innovative drug discovery.
Collapse
Affiliation(s)
- Ruth Nussinov
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research in the Laboratory of Cancer Immunometabolism, National Cancer Institute, Frederick, MD 21702, USA; Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel.
| | - Mingzhen Zhang
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research in the Laboratory of Cancer Immunometabolism, National Cancer Institute, Frederick, MD 21702, USA
| | - Ryan Maloney
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research in the Laboratory of Cancer Immunometabolism, National Cancer Institute, Frederick, MD 21702, USA
| | - Yonglan Liu
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research in the Laboratory of Cancer Immunometabolism, National Cancer Institute, Frederick, MD 21702, USA
| | - Chung-Jung Tsai
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research in the Laboratory of Cancer Immunometabolism, National Cancer Institute, Frederick, MD 21702, USA
| | - Hyunbum Jang
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research in the Laboratory of Cancer Immunometabolism, National Cancer Institute, Frederick, MD 21702, USA
| |
Collapse
|
7
|
Zhang X, Zhou Y, Shi Z, Liu Z, Chen H, Wang X, Cheng Y, Xi L, Li X, Zhang C, Bao L, Xuan C. Integrated analysis of genes encoding ATP-dependent chromatin remodellers identifies CHD7 as a potential target for colorectal cancer therapy. Clin Transl Med 2022; 12:e953. [PMID: 35789070 PMCID: PMC9254903 DOI: 10.1002/ctm2.953] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 06/09/2022] [Accepted: 06/15/2022] [Indexed: 11/08/2022] Open
Abstract
BACKGROUND Genes participating in chromatin organization and regulation are frequently mutated or dysregulated in cancers. ATP-dependent chromatin remodelers (ATPCRs) play a key role in organizing genomic DNA within chromatin, therefore regulating gene expression. The oncogenic role of ATPCRs and the mechanism involved remains unclear. METHODS We analyzed the genomic and transcriptional aberrations of the genes encoding ATPCRs in The Cancer Genome Atlas (TCGA) cohort. A series of cellular experiments and mouse tumor-bearing experiments were conducted to reveal the regulatory function of CHD7 on the growth of colorectal cancer cells. RNA-seq and ATAC-seq approaches together with ChIP assays were performed to elucidate the downstream targets and the molecular mechanisms. RESULTS Our data showed that many ATPCRs represented a high frequency of somatic copy number alterations, widespread somatic mutations, remarkable expression abnormalities, and significant correlation with overall survival, suggesting several somatic driver candidates including chromodomain helicase DNA-binding protein 7 (CHD7) in colorectal cancer. We experimentally demonstrated that CHD7 promotes the growth of colorectal cancer cells in vitro and in vivo. CHD7 can bind to the promoters of target genes to maintain chromatin accessibility and facilitate transcription. We found that CHD7 knockdown downregulates AK4 expression and activates AMPK phosphorylation, thereby promoting the phosphorylation and stability of p53 and leading to the inhibition of the colorectal cancer growth. Our muti-omics analyses of ATPCRs across large-scale cancer specimens identified potential therapeutic targets and our experimental studies revealed a novel CHD7-AK4-AMPK-p53 axis that plays an oncogenic role in colorectal cancer.
Collapse
Affiliation(s)
- Xingyan Zhang
- The Province and Ministry Co‐sponsored Collaborative Innovation Center for Medical Epigenetics, Key Laboratory of Immune Microenvironment and Disease (Ministry of Education), Department of Biochemistry and Molecular BiologyTianjin Medical UniversityTianjinChina
| | - Yaoyao Zhou
- Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin's Clinical Research Center for Cancer, Key Laboratory of Breast Cancer Prevention and TherapyTianjin Medical University, Ministry of EducationTianjinChina
| | - Zhenyu Shi
- Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin's Clinical Research Center for Cancer, Key Laboratory of Breast Cancer Prevention and TherapyTianjin Medical University, Ministry of EducationTianjinChina
| | - Zhenfeng Liu
- The Province and Ministry Co‐sponsored Collaborative Innovation Center for Medical Epigenetics, Key Laboratory of Immune Microenvironment and Disease (Ministry of Education), Department of Biochemistry and Molecular BiologyTianjin Medical UniversityTianjinChina
| | - Hao Chen
- The Province and Ministry Co‐sponsored Collaborative Innovation Center for Medical Epigenetics, Key Laboratory of Immune Microenvironment and Disease (Ministry of Education), Department of Biochemistry and Molecular BiologyTianjin Medical UniversityTianjinChina
| | - Xiaochen Wang
- The Province and Ministry Co‐sponsored Collaborative Innovation Center for Medical Epigenetics, Key Laboratory of Immune Microenvironment and Disease (Ministry of Education), Department of Biochemistry and Molecular BiologyTianjin Medical UniversityTianjinChina
| | - Yiming Cheng
- The Province and Ministry Co‐sponsored Collaborative Innovation Center for Medical Epigenetics, Key Laboratory of Immune Microenvironment and Disease (Ministry of Education), Department of Biochemistry and Molecular BiologyTianjin Medical UniversityTianjinChina
| | - Lishan Xi
- The Province and Ministry Co‐sponsored Collaborative Innovation Center for Medical Epigenetics, Key Laboratory of Immune Microenvironment and Disease (Ministry of Education), Department of Biochemistry and Molecular BiologyTianjin Medical UniversityTianjinChina
| | - Xuanyuan Li
- The Province and Ministry Co‐sponsored Collaborative Innovation Center for Medical Epigenetics, Key Laboratory of Immune Microenvironment and Disease (Ministry of Education), Department of Biochemistry and Molecular BiologyTianjin Medical UniversityTianjinChina
| | - Chunze Zhang
- Tianjin Institute of Coloproctology, Department of Colorectal SurgeryTianjin Union Medical CenterTianjinChina
| | - Li Bao
- Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin's Clinical Research Center for Cancer, Key Laboratory of Breast Cancer Prevention and TherapyTianjin Medical University, Ministry of EducationTianjinChina
| | - Chenghao Xuan
- The Province and Ministry Co‐sponsored Collaborative Innovation Center for Medical Epigenetics, Key Laboratory of Immune Microenvironment and Disease (Ministry of Education), Department of Biochemistry and Molecular BiologyTianjin Medical UniversityTianjinChina
| |
Collapse
|
8
|
Zhang LQ, Liu JJ, Liu L, Fan GL, Li YN, Li QZ. The impact of gene-body H3K36me3 patterns on gene expression level changes in chronic myelogenous leukemia. Gene 2021; 802:145862. [PMID: 34352296 DOI: 10.1016/j.gene.2021.145862] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2021] [Revised: 07/07/2021] [Accepted: 07/30/2021] [Indexed: 11/29/2022]
Abstract
Chronic myelogenous leukemia (CML) is a malignant clonal disease of hematopoietic stem cells. Researches have exhibited that the progression of CML is related to histone modifications. Here, we perform the systematic analyses of H3K36me3 patterns and gene expression level changes. We observe that the genes with higher gene-body H3K36me3 levels in normal cells show fewer expression changes during leukemogenesis, while the genes with lower gene-body H3K36me3 levels in normal cells yield obvious expression changes during leukemogenesis (ρ = -0.98, P = 9.30 × 10-8). These findings are conserved in human lung/breast cancers and mouse CML, regardless of gene expression levels and gene lengths. Regulatory element analysis and Random Forest regression display that Hoxd13, Rara, Scl, Smad3, Smad4 and Tgif1 induce the up-regulation of genes with lower H3K36me3 levels (ρ = 0.97, P = 2.35 × 10-56). Enrichment analysis shows that the differentially expressed genes with lower H3K36me3 levels are involved in leukemia-related pathways, such as leukocyte migration and regulation of leukocyte activation. Finally, six driver genes (Tp53, Wt1, Dnmt3a, Cacna1b, Phactr1 and Gbp4) with lower H3K36me3 levels are identified. Our analyses indicate that lower gene-body H3K36me3 levels may serve as a biomarker for the progression of CML.
Collapse
Affiliation(s)
- Lu-Qiang Zhang
- Laboratory of Theoretical Biophysics, School oef Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China.
| | - Jun-Jie Liu
- Laboratory of Theoretical Biophysics, School oef Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China
| | - Li Liu
- Laboratory of Theoretical Biophysics, School oef Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China
| | - Guo-Liang Fan
- Laboratory of Theoretical Biophysics, School oef Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China
| | - Yan-Nan Li
- Laboratory of Theoretical Biophysics, School oef Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China
| | - Qian-Zhong Li
- Laboratory of Theoretical Biophysics, School oef Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China; The Research Center for Laboratory Animal Science, College of Life Sciences, Inner Mongolia University, Hohhot 010021, China.
| |
Collapse
|
9
|
Khan S, Jha A, Panda AC, Dixit A. Cancer-Associated circRNA-miRNA-mRNA Regulatory Networks: A Meta-Analysis. Front Mol Biosci 2021; 8:671309. [PMID: 34055888 PMCID: PMC8149909 DOI: 10.3389/fmolb.2021.671309] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Accepted: 04/13/2021] [Indexed: 01/11/2023] Open
Abstract
Recent advances in sequencing technologies and the discovery of non-coding RNAs (ncRNAs) have provided new insights in the molecular pathogenesis of cancers. Several studies have implicated the role of ncRNAs, including microRNAs (miRNAs), long non-coding RNAs (lncRNAs), and recently discovered circular RNAs (circRNAs) in tumorigenesis and metastasis. Unlike linear RNAs, circRNAs are highly stable and closed-loop RNA molecules. It has been established that circRNAs regulate gene expression by controlling the functions of miRNAs and RNA-binding protein (RBP) or by translating into proteins. The circRNA-miRNA-mRNA regulatory axis is associated with human diseases, such as cancers, Alzheimer's disease, and diabetes. In this study, we explored the interaction among circRNAs, miRNAs, and their target genes in various cancers using state-of-the-art bioinformatics tools. We identified differentially expressed circRNAs, miRNAs, and mRNAs on multiple cancers from publicly available data. Furthermore, we identified many crucial drivers and tumor suppressor genes in the circRNA-miRNA-mRNA regulatory axis in various cancers. Together, this study data provide a deeper understanding of the circRNA-miRNA-mRNA regulatory mechanisms in cancers.
Collapse
Affiliation(s)
- Shaheerah Khan
- Institute of Life Sciences, Bhubaneswar, India
- Regional Centre for Biotechnology, Faridabad, India
| | - Atimukta Jha
- Institute of Life Sciences, Bhubaneswar, India
- Manipal Academy of Higher Education, Manipal, India
| | | | | |
Collapse
|
10
|
Nayak A, Kumar S, Singh SP, Bhattacharyya A, Dixit A, Roychowdhury A. Oncogenic potential of ATAD2 in stomach cancer and insights into the protein-protein interactions at its AAA + ATPase domain and bromodomain. J Biomol Struct Dyn 2021; 40:5606-5622. [PMID: 33438526 DOI: 10.1080/07391102.2021.1871959] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
ATAD2 has recently been shown to promote stomach cancer. However, nothing is known about the functional network of ATAD2 in stomach carcinogenesis. This study illustrates the oncogenic potential of ATAD2 and the participation of its ATPase and bromodomain in stomach malignancy. Expression of ATAD2 in stomach cancer is analyzed by in silico and in vitro techniques including western blot and immunofluorescence microscopy of stomach cancer cells (SCCs) and tissues. The oncogenic potential of ATAD2 is examined thoroughly using genetic alterations, driver gene prediction, survival analysis, identification of interacting partners, and analysis of canonical pathways. To understand the protein-protein interactions (PPI) at residue level, molecular docking and molecular dynamics simulations (1200 ns) are performed. Enhanced expression of ATAD2 is observed in H. pylori-infected SCCs, patient biopsy tissues, and all stages and grades of stomach cancer. High expression of ATAD2 is found to be negatively correlated with the survival of stomach cancer patients. ATAD2 is a cancer driver gene with 37 mutational sites and a predictable factor for stomach cancer prognosis with high accuracy. The top canonical pathways of ATAD2 indicate its participation in stomach malignancy. The ATAD2-PPI in stomach cancer identify top-ranked partners; ESR1, SUMO2, SPTN2, and MYC show preference for the bromodomain whereas NCOA3 and HDA11 have preference for the ATPase domain of ATAD2. The oncogenic characterization of ATAD2 provides strong evidence to consider ATAD2 as a stomach cancer biomarker. These studies offer an insight for the first time into the ATAD2-PPI interface presenting a novel target for cancer therapeutics. Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Aditi Nayak
- School of Basic Sciences, Indian Institute of Technology Bhubaneswar, Odisha, India
| | - Sugandh Kumar
- Institute of Life Sciences, Bhubaneswar, Odisha, India
| | | | - Asima Bhattacharyya
- School of Biological Sciences, National Institute of Science Education and Research (NISER) Bhubaneswar, HBNI, Khurda, Odisha, India
| | | | - Anasuya Roychowdhury
- School of Basic Sciences, Indian Institute of Technology Bhubaneswar, Odisha, India
| |
Collapse
|
11
|
Martinez-Ledesma E, Flores D, Trevino V. Computational methods for detecting cancer hotspots. Comput Struct Biotechnol J 2020; 18:3567-3576. [PMID: 33304455 PMCID: PMC7711189 DOI: 10.1016/j.csbj.2020.11.020] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Revised: 11/12/2020] [Accepted: 11/13/2020] [Indexed: 12/14/2022] Open
Abstract
Cancer mutations that are recurrently observed among patients are known as hotspots. Hotspots are highly relevant because they are, presumably, likely functional. Known hotspots in BRAF, PIK3CA, TP53, KRAS, IDH1 support this idea. However, hundreds of hotspots have never been validated experimentally. The detection of hotspots nevertheless is challenging because background mutations obscure their statistical and computational identification. Although several algorithms have been applied to identify hotspots, they have not been reviewed before. Thus, in this mini-review, we summarize more than 40 computational methods applied to detect cancer hotspots in coding and non-coding DNA. We first organize the methods in cluster-based, 3D, position-specific, and miscellaneous to provide a general overview. Then, we describe their embed procedures, implementations, variations, and differences. Finally, we discuss some advantages, provide some ideas for future developments, and mention opportunities such as application to viral integrations, translocations, and epigenetics.
Collapse
Affiliation(s)
- Emmanuel Martinez-Ledesma
- Tecnologico de Monterrey, Escuela de Medicina y Ciencias de la Salud, Bioinformática y Diagnóstico Clínico, Monterrey, Nuevo León, Mexico
| | - David Flores
- Tecnologico de Monterrey, Escuela de Medicina y Ciencias de la Salud, Bioinformática y Diagnóstico Clínico, Monterrey, Nuevo León, Mexico
- Universidad del Caribe, Departamento de Ciencias Básicas e Ingenierías, Cancún, Quintana Roo, Mexico
| | - Victor Trevino
- Tecnologico de Monterrey, Escuela de Medicina y Ciencias de la Salud, Bioinformática y Diagnóstico Clínico, Monterrey, Nuevo León, Mexico
| |
Collapse
|
12
|
Gu H, Xu X, Qin P, Wang J. FI-Net: Identification of Cancer Driver Genes by Using Functional Impact Prediction Neural Network. Front Genet 2020; 11:564839. [PMID: 33244318 PMCID: PMC7683798 DOI: 10.3389/fgene.2020.564839] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2020] [Accepted: 09/30/2020] [Indexed: 12/24/2022] Open
Abstract
Identification of driver genes, whose mutations cause the development of tumors, is crucial for the improvement of cancer research and precision medicine. To overcome the problem that the traditional frequency-based methods cannot detect lowly recurrently mutated driver genes, researchers have focused on the functional impact of gene mutations and proposed the function-based methods. However, most of the function-based methods estimate the distribution of the null model through the non-parametric method, which is sensitive to sample size. Besides, such methods could probably lead to underselection or overselection results. In this study, we proposed a method to identify driver genes by using functional impact prediction neural network (FI-net). An artificial neural network as a parametric model was constructed to estimate the functional impact scores for genes, in which multi-omics features were used as the multivariate inputs. Then the estimation of the background distribution and the identification of driver genes were conducted in each cluster obtained by the hierarchical clustering algorithm. We applied FI-net and other 22 state-of-the-art methods to 31 datasets from The Cancer Genome Atlas project. According to the comprehensive evaluation criterion, FI-net was powerful among various datasets and outperformed the other methods in terms of the overlap fraction with Cancer Gene Census and Network of Cancer Genes database, and the consensus in predictions among methods. Furthermore, the results illustrated that FI-net can identify known and potential novel driver genes.
Collapse
Affiliation(s)
- Hong Gu
- Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, China
| | - Xiaolu Xu
- Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, China
| | - Pan Qin
- Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, China
| | - Jia Wang
- Department of Breast Surgery, Institute of Breast Disease, Second Hospital of Dalian Medical University, Dalian, China
| |
Collapse
|
13
|
Kobren SN, Chazelle B, Singh M. PertInInt: An Integrative, Analytical Approach to Rapidly Uncover Cancer Driver Genes with Perturbed Interactions and Functionalities. Cell Syst 2020; 11:63-74.e7. [PMID: 32711844 PMCID: PMC7493809 DOI: 10.1016/j.cels.2020.06.005] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2019] [Revised: 02/23/2020] [Accepted: 06/05/2020] [Indexed: 12/12/2022]
Abstract
A major challenge in cancer genomics is to identify genes with functional roles in cancer and uncover their mechanisms of action. We introduce an integrative framework that identifies cancer-relevant genes by pinpointing those whose interaction or other functional sites are enriched in somatic mutations across tumors. We derive analytical calculations that enable us to avoid time-prohibitive permutation-based significance tests, making it computationally feasible to simultaneously consider multiple measures of protein site functionality. Our accompanying software, PertInInt, combines knowledge about sites participating in interactions with DNA, RNA, peptides, ions, or small molecules with domain, evolutionary conservation, and gene-level mutation data. When applied to 10,037 tumor samples, PertInInt uncovers both known and newly predicted cancer genes, while additionally revealing what types of interactions or other functionalities are disrupted. PertInInt’s analysis demonstrates that somatic mutations are frequently enriched in interaction sites and domains and implicates interaction perturbation as a pervasive cancer-driving event. A fast, analytical framework called PertInInt enables efficient integration of multiple measures of protein site functionality—including interaction, domain, and evolutionary conservation—with gene-level mutation data in order to rapidly detect cancer driver genes along with their disrupted functionalities.
Collapse
Affiliation(s)
- Shilpa Nadimpalli Kobren
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA; Department of Computer Science, Princeton University, Princeton, NJ, USA; Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
| | - Bernard Chazelle
- Department of Computer Science, Princeton University, Princeton, NJ, USA
| | - Mona Singh
- Department of Computer Science, Princeton University, Princeton, NJ, USA; Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA.
| |
Collapse
|
14
|
Huang Y, Chang X, Zhang Y, Chen L, Liu X. Disease characterization using a partial correlation-based sample-specific network. Brief Bioinform 2020; 22:5838457. [PMID: 32422654 DOI: 10.1093/bib/bbaa062] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2019] [Revised: 03/25/2020] [Accepted: 03/26/2020] [Indexed: 12/23/2022] Open
Abstract
A single-sample network (SSN) is a biological molecular network constructed from single-sample data given a reference dataset and can provide insights into the mechanisms of individual diseases and aid in the development of personalized medicine. In this study, we proposed a computational method, a partial correlation-based single-sample network (P-SSN), which not only infers a network from each single-sample data given a reference dataset but also retains the direct interactions by excluding indirect interactions (https://github.com/hyhRise/P-SSN). By applying P-SSN to analyze tumor data from the Cancer Genome Atlas and single cell data, we validated the effectiveness of P-SSN in predicting driver mutation genes (DMGs), producing network distance, identifying subtypes and further classifying single cells. In particular, P-SSN is highly effective in predicting DMGs based on single-sample data. P-SSN is also efficient for subtyping complex diseases and for clustering single cells by introducing network distance between any two samples.
Collapse
Affiliation(s)
- Yanhong Huang
- Institute of Statistics and Applied Mathematics, Anhui University of Finance & Economics, Bengbu 233030, China, and School of Mathematics and Statistics, Shandong University at Weihai, Weihai 264209, China
| | - Xiao Chang
- Institute of Statistics and Applied Mathematics, Anhui University of Finance & Economics, Bengbu 233030, China
| | - Yu Zhang
- School of Mathematics and Statistics, Shandong University at Weihai, Weihai 264209, China
| | - Luonan Chen
- Key Laboratory of Systems Biology, Center for Excellence in Molecular Cell Science, Shanghai institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, Shanghai 200031, China, Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China, Shanghai Research Center for Brain Science and Brain-Inspired Intelligence, Shanghai 201210, China, and Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou 310024, China
| | - Xiaoping Liu
- School of Mathematics and Statistics, Shandong University at Weihai, Weihai 264209, China
| |
Collapse
|
15
|
Han Y, Yang J, Qian X, Cheng WC, Liu SH, Hua X, Zhou L, Yang Y, Wu Q, Liu P, Lu Y. DriverML: a machine learning algorithm for identifying driver genes in cancer sequencing studies. Nucleic Acids Res 2019; 47:e45. [PMID: 30773592 PMCID: PMC6486576 DOI: 10.1093/nar/gkz096] [Citation(s) in RCA: 70] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2019] [Accepted: 02/04/2019] [Indexed: 12/24/2022] Open
Abstract
Although rapid progress has been made in computational approaches for prioritizing cancer driver genes, research is far from achieving the ultimate goal of discovering a complete catalog of genes truly associated with cancer. Driver gene lists predicted from these computational tools lack consistency and are prone to false positives. Here, we developed an approach (DriverML) integrating Rao’s score test and supervised machine learning to identify cancer driver genes. The weight parameters in the score statistics quantified the functional impacts of mutations on the protein. To obtain optimized weight parameters, the score statistics of prior driver genes were maximized on pan-cancer training data. We conducted rigorous and unbiased benchmark analysis and comparisons of DriverML with 20 other existing tools in 31 independent datasets from The Cancer Genome Atlas (TCGA). Our comprehensive evaluations demonstrated that DriverML was robust and powerful among various datasets and outperformed the other tools with a better balance of precision and sensitivity. In vitro cell-based assays further proved the validity of the DriverML prediction of novel driver genes. In summary, DriverML uses an innovative, machine learning-based approach to prioritize cancer driver genes and provides dramatic improvements over currently existing methods. Its source code is available at https://github.com/HelloYiHan/DriverML.
Collapse
Affiliation(s)
- Yi Han
- Center for Uterine Cancer Diagnosis and Therapy Research of Zhejiang Province, Women's Reproductive Health Key Laboratory of Zhejiang Province, Women's Hospital and Institute of Translational Medicine, Zhejiang University School of Medicine, Hangzhou, Zhejiang 310006, China
| | - Juze Yang
- Sir Run Run Shaw Hospital and Institute of Translational Medicine, Zhejiang University School of Medicine, Hangzhou, Zhejiang 310016, China
| | - Xinyi Qian
- Sir Run Run Shaw Hospital and Institute of Translational Medicine, Zhejiang University School of Medicine, Hangzhou, Zhejiang 310016, China
| | - Wei-Chung Cheng
- Graduate Institute of Biomedical Sciences, Research Center for Tumor Medical Science, and Drug Development Center, China Medical University, Taichung 40402, Taiwan
| | - Shu-Hsuan Liu
- Graduate Institute of Biomedical Sciences, Research Center for Tumor Medical Science, and Drug Development Center, China Medical University, Taichung 40402, Taiwan
| | - Xing Hua
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Bethesda, MD 20892, USA
| | - Liyuan Zhou
- Sir Run Run Shaw Hospital and Institute of Translational Medicine, Zhejiang University School of Medicine, Hangzhou, Zhejiang 310016, China
| | - Yaning Yang
- Department of Statistics and Finance, University of Science and Technology of China, Hefei, Anhui 230026, China
| | - Qingbiao Wu
- Department of Mathematics, Zhejiang University, Hangzhou, Zhejiang 310027, China
| | - Pengyuan Liu
- Sir Run Run Shaw Hospital and Institute of Translational Medicine, Zhejiang University School of Medicine, Hangzhou, Zhejiang 310016, China
| | - Yan Lu
- Center for Uterine Cancer Diagnosis and Therapy Research of Zhejiang Province, Women's Reproductive Health Key Laboratory of Zhejiang Province, Women's Hospital and Institute of Translational Medicine, Zhejiang University School of Medicine, Hangzhou, Zhejiang 310006, China
| |
Collapse
|
16
|
Nussinov R, Tsai C, Jang H. Autoinhibition can identify rare driver mutations and advise pharmacology. FASEB J 2019; 34:16-29. [DOI: 10.1096/fj.201901341r] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Revised: 09/18/2019] [Accepted: 10/09/2019] [Indexed: 12/16/2022]
Affiliation(s)
- Ruth Nussinov
- Computational Structural Biology Section Basic Science Program Frederick National Laboratory for Cancer Research Frederick MD USA
- Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine Tel Aviv University Tel Aviv Israel
| | - Chung‐Jung Tsai
- Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine Tel Aviv University Tel Aviv Israel
| | - Hyunbum Jang
- Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine Tel Aviv University Tel Aviv Israel
| |
Collapse
|
17
|
Functional characterization of 3D protein structures informed by human genetic diversity. Proc Natl Acad Sci U S A 2019; 116:8960-8965. [PMID: 30988206 PMCID: PMC6500140 DOI: 10.1073/pnas.1820813116] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
Sequence variation data of the human proteome can be used to analyze 3D protein structures to derive functional insights. We used genetic variant data from nearly 140,000 individuals to analyze 3D positional conservation in 4,715 proteins and 3,951 homology models using 860,292 missense and 465,886 synonymous variants. Sixty percent of protein structures harbor at least one intolerant 3D site as defined by significant depletion of observed over expected missense variation. Structural intolerance data correlated with deep mutational scanning functional readouts for PPARG, MAPK1/ERK2, UBE2I, SUMO1, PTEN, CALM1, CALM2, and TPK1 and with shallow mutagenesis data for 1,026 proteins. The 3D structural intolerance analysis revealed different features for ligand binding pockets and orthosteric and allosteric sites. Large-scale data on human genetic variation support a definition of functional 3D sites proteome-wide.
Collapse
|
18
|
Review: Precision medicine and driver mutations: Computational methods, functional assays and conformational principles for interpreting cancer drivers. PLoS Comput Biol 2019; 15:e1006658. [PMID: 30921324 PMCID: PMC6438456 DOI: 10.1371/journal.pcbi.1006658] [Citation(s) in RCA: 55] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
At the root of the so-called precision medicine or precision oncology, which is our focus here, is the hypothesis that cancer treatment would be considerably better if therapies were guided by a tumor’s genomic alterations. This hypothesis has sparked major initiatives focusing on whole-genome and/or exome sequencing, creation of large databases, and developing tools for their statistical analyses—all aspiring to identify actionable alterations, and thus molecular targets, in a patient. At the center of the massive amount of collected sequence data is their interpretations that largely rest on statistical analysis and phenotypic observations. Statistics is vital, because it guides identification of cancer-driving alterations. However, statistics of mutations do not identify a change in protein conformation; therefore, it may not define sufficiently accurate actionable mutations, neglecting those that are rare. Among the many thematic overviews of precision oncology, this review innovates by further comprehensively including precision pharmacology, and within this framework, articulating its protein structural landscape and consequences to cellular signaling pathways. It provides the underlying physicochemical basis, thereby also opening the door to a broader community.
Collapse
|
19
|
Rajendran BK, Deng CX. Characterization of potential driver mutations involved in human breast cancer by computational approaches. Oncotarget 2018; 8:50252-50272. [PMID: 28477017 PMCID: PMC5564847 DOI: 10.18632/oncotarget.17225] [Citation(s) in RCA: 41] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2017] [Accepted: 03/26/2017] [Indexed: 02/06/2023] Open
Abstract
Breast cancer is the second most frequently occurring form of cancer and is also the second most lethal cancer in women worldwide. A genetic mutation is one of the key factors that alter multiple cellular regulatory pathways and drive breast cancer initiation and progression yet nature of these cancer drivers remains elusive. In this article, we have reviewed various computational perspectives and algorithms for exploring breast cancer driver mutation genes. Using both frequency based and mutational exclusivity based approaches, we identified 195 driver genes and shortlisted 63 of them as candidate drivers for breast cancer using various computational approaches. Finally, we conducted network and pathway analysis to explore their functions in breast tumorigenesis including tumor initiation, progression, and metastasis.
Collapse
Affiliation(s)
- Barani Kumar Rajendran
- Cancer Research Centre, Faculty of Health Sciences, University of Macau, Macau SAR, China
| | - Chu-Xia Deng
- Cancer Research Centre, Faculty of Health Sciences, University of Macau, Macau SAR, China
| |
Collapse
|
20
|
Comparison of algorithms for the detection of cancer drivers at subgene resolution. Nat Methods 2017; 14:782-788. [PMID: 28714987 DOI: 10.1038/nmeth.4364] [Citation(s) in RCA: 63] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2017] [Accepted: 06/16/2017] [Indexed: 12/19/2022]
Abstract
Understanding genetic events that lead to cancer initiation and progression remains one of the biggest challenges in cancer biology. Traditionally, most algorithms for cancer-driver identification look for genes that have more mutations than expected from the average background mutation rate. However, there is now a wide variety of methods that look for nonrandom distribution of mutations within proteins as a signal for the driving role of mutations in cancer. Here we classify and review such subgene-resolution algorithms, compare their findings on four distinct cancer data sets from The Cancer Genome Atlas and discuss how predictions from these algorithms can be interpreted in the emerging paradigms that challenge the simple dichotomy between driver and passenger genes.
Collapse
|
21
|
Multiscale mutation clustering algorithm identifies pan-cancer mutational clusters associated with pathway-level changes in gene expression. PLoS Comput Biol 2017; 13:e1005347. [PMID: 28170390 PMCID: PMC5321471 DOI: 10.1371/journal.pcbi.1005347] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2016] [Revised: 02/22/2017] [Accepted: 01/04/2017] [Indexed: 12/22/2022] Open
Abstract
Cancer researchers have long recognized that somatic mutations are not uniformly distributed within genes. However, most approaches for identifying cancer mutations focus on either the entire-gene or single amino-acid level. We have bridged these two methodologies with a multiscale mutation clustering algorithm that identifies variable length mutation clusters in cancer genes. We ran our algorithm on 539 genes using the combined mutation data in 23 cancer types from The Cancer Genome Atlas (TCGA) and identified 1295 mutation clusters. The resulting mutation clusters cover a wide range of scales and often overlap with many kinds of protein features including structured domains, phosphorylation sites, and known single nucleotide variants. We statistically associated these multiscale clusters with gene expression and drug response data to illuminate the functional and clinical consequences of mutations in our clusters. Interestingly, we find multiple clusters within individual genes that have differential functional associations: these include PTEN, FUBP1, and CDH1. This methodology has potential implications in identifying protein regions for drug targets, understanding the biological underpinnings of cancer, and personalizing cancer treatments. Toward this end, we have made the mutation clusters and the clustering algorithm available to the public. Clusters and pathway associations can be interactively browsed at m2c.systemsbiology.net. The multiscale mutation clustering algorithm is available at https://github.com/IlyaLab/M2C.
Collapse
|
22
|
Xi J, Wang M, Li A. Discovering potential driver genes through an integrated model of somatic mutation profiles and gene functional information. MOLECULAR BIOSYSTEMS 2017; 13:2135-2144. [DOI: 10.1039/c7mb00303j] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
An integrated approach to identify driver genes based on information of somatic mutations, the interaction network and Gene Ontology similarity.
Collapse
Affiliation(s)
- Jianing Xi
- School of Information Science and Technology
- University of Science and Technology of China
- Hefei AH 230027
- People’s Republic of China
| | - Minghui Wang
- School of Information Science and Technology
- University of Science and Technology of China
- Hefei AH 230027
- People’s Republic of China
- Centers for Biomedical Engineering
| | - Ao Li
- School of Information Science and Technology
- University of Science and Technology of China
- Hefei AH 230027
- People’s Republic of China
- Centers for Biomedical Engineering
| |
Collapse
|
23
|
Systematic analysis of mutation distribution in three dimensional protein structures identifies cancer driver genes. Sci Rep 2016; 6:26483. [PMID: 27225414 PMCID: PMC4880911 DOI: 10.1038/srep26483] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2015] [Accepted: 05/03/2016] [Indexed: 01/26/2023] Open
Abstract
Protein tertiary structure determines molecular function, interaction, and stability of the protein, therefore distribution of mutation in the tertiary structure can facilitate the identification of new driver genes in cancer. To analyze mutation distribution in protein tertiary structures, we applied a novel three dimensional permutation test to the mutation positions. We analyzed somatic mutation datasets of 21 types of cancers obtained from exome sequencing conducted by the TCGA project. Of the 3,622 genes that had ≥3 mutations in the regions with tertiary structure data, 106 genes showed significant skew in mutation distribution. Known tumor suppressors and oncogenes were significantly enriched in these identified cancer gene sets. Physical distances between mutations in known oncogenes were significantly smaller than those of tumor suppressors. Twenty-three genes were detected in multiple cancers. Candidate genes with significant skew of the 3D mutation distribution included kinases (MAPK1, EPHA5, ERBB3, and ERBB4), an apoptosis related gene (APP), an RNA splicing factor (SF1), a miRNA processing factor (DICER1), an E3 ubiquitin ligase (CUL1) and transcription factors (KLF5 and EEF1B2). Our study suggests that systematic analysis of mutation distribution in the tertiary protein structure can help identify cancer driver genes.
Collapse
|
24
|
Tokheim C, Bhattacharya R, Niknafs N, Gygax DM, Kim R, Ryan M, Masica DL, Karchin R. Exome-Scale Discovery of Hotspot Mutation Regions in Human Cancer Using 3D Protein Structure. Cancer Res 2016; 76:3719-31. [PMID: 27197156 DOI: 10.1158/0008-5472.can-15-3190] [Citation(s) in RCA: 72] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2015] [Accepted: 04/01/2016] [Indexed: 12/12/2022]
Abstract
The impact of somatic missense mutation on cancer etiology and progression is often difficult to interpret. One common approach for assessing the contribution of missense mutations in carcinogenesis is to identify genes mutated with statistically nonrandom frequencies. Even given the large number of sequenced cancer samples currently available, this approach remains underpowered to detect drivers, particularly in less studied cancer types. Alternative statistical and bioinformatic approaches are needed. One approach to increase power is to focus on localized regions of increased missense mutation density or hotspot regions, rather than a whole gene or protein domain. Detecting missense mutation hotspot regions in three-dimensional (3D) protein structure may also be beneficial because linear sequence alone does not fully describe the biologically relevant organization of codons. Here, we present a novel and statistically rigorous algorithm for detecting missense mutation hotspot regions in 3D protein structures. We analyzed approximately 3 × 10(5) mutations from The Cancer Genome Atlas (TCGA) and identified 216 tumor-type-specific hotspot regions. In addition to experimentally determined protein structures, we considered high-quality structural models, which increase genomic coverage from approximately 5,000 to more than 15,000 genes. We provide new evidence that 3D mutation analysis has unique advantages. It enables discovery of hotspot regions in many more genes than previously shown and increases sensitivity to hotspot regions in tumor suppressor genes (TSG). Although hotspot regions have long been known to exist in both TSGs and oncogenes, we provide the first report that they have different characteristic properties in the two types of driver genes. We show how cancer researchers can use our results to link 3D protein structure and the biologic functions of missense mutations in cancer, and to generate testable hypotheses about driver mechanisms. Our results are included in a new interactive website for visualizing protein structures with TCGA mutations and associated hotspot regions. Users can submit new sequence data, facilitating the visualization of mutations in a biologically relevant context. Cancer Res; 76(13); 3719-31. ©2016 AACR.
Collapse
Affiliation(s)
- Collin Tokheim
- Department of Biomedical Engineering and Institute for Computational Medicine, Johns Hopkins University, Baltimore, Maryland
| | - Rohit Bhattacharya
- Department of Biomedical Engineering and Institute for Computational Medicine, Johns Hopkins University, Baltimore, Maryland
| | - Noushin Niknafs
- Department of Biomedical Engineering and Institute for Computational Medicine, Johns Hopkins University, Baltimore, Maryland
| | | | - Rick Kim
- In Silico Solutions, Fairfax, Virginia
| | | | - David L Masica
- Department of Biomedical Engineering and Institute for Computational Medicine, Johns Hopkins University, Baltimore, Maryland
| | - Rachel Karchin
- Department of Biomedical Engineering and Institute for Computational Medicine, Johns Hopkins University, Baltimore, Maryland. Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, Maryland.
| |
Collapse
|
25
|
Ryslik GA, Cheng Y, Modis Y, Zhao H. Leveraging protein quaternary structure to identify oncogenic driver mutations. BMC Bioinformatics 2016; 17:137. [PMID: 27001666 PMCID: PMC4802602 DOI: 10.1186/s12859-016-0963-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2015] [Accepted: 02/18/2016] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND Identifying key "driver" mutations which are responsible for tumorigenesis is critical in the development of new oncology drugs. Due to multiple pharmacological successes in treating cancers that are caused by such driver mutations, a large body of methods have been developed to differentiate these mutations from the benign "passenger" mutations which occur in the tumor but do not further progress the disease. Under the hypothesis that driver mutations tend to cluster in key regions of the protein, the development of algorithms that identify these clusters has become a critical area of research. RESULTS We have developed a novel methodology, QuartPAC (Quaternary Protein Amino acid Clustering), that identifies non-random mutational clustering while utilizing the protein quaternary structure in 3D space. By integrating the spatial information in the Protein Data Bank (PDB) and the mutational data in the Catalogue of Somatic Mutations in Cancer (COSMIC), QuartPAC is able to identify clusters which are otherwise missed in a variety of proteins. The R package is available on Bioconductor at: http://bioconductor.jp/packages/3.1/bioc/html/QuartPAC.html . CONCLUSION QuartPAC provides a unique tool to identify mutational clustering while accounting for the complete folded protein quaternary structure.
Collapse
Affiliation(s)
- Gregory A. Ryslik
- />Department of Biostatistics, Yale School of Public Health, New Haven, CT USA
| | - Yuwei Cheng
- />Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT USA
| | - Yorgo Modis
- />Department of Medicine, University of Cambridge, MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge, CB2 0QH UK
| | - Hongyu Zhao
- />Department of Biostatistics, Yale School of Public Health, New Haven, CT USA
- />Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT USA
| |
Collapse
|
26
|
Meyer MJ, Lapcevic R, Romero AE, Yoon M, Das J, Beltrán JF, Mort M, Stenson PD, Cooper DN, Paccanaro A, Yu H. mutation3D: Cancer Gene Prediction Through Atomic Clustering of Coding Variants in the Structural Proteome. Hum Mutat 2016; 37:447-56. [PMID: 26841357 DOI: 10.1002/humu.22963] [Citation(s) in RCA: 64] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2015] [Accepted: 01/14/2016] [Indexed: 12/20/2022]
Abstract
A new algorithm and Web server, mutation3D (http://mutation3d.org), proposes driver genes in cancer by identifying clusters of amino acid substitutions within tertiary protein structures. We demonstrate the feasibility of using a 3D clustering approach to implicate proteins in cancer based on explorations of single proteins using the mutation3D Web interface. On a large scale, we show that clustering with mutation3D is able to separate functional from nonfunctional mutations by analyzing a combination of 8,869 known inherited disease mutations and 2,004 SNPs overlaid together upon the same sets of crystal structures and homology models. Further, we present a systematic analysis of whole-genome and whole-exome cancer datasets to demonstrate that mutation3D identifies many known cancer genes as well as previously underexplored target genes. The mutation3D Web interface allows users to analyze their own mutation data in a variety of popular formats and provides seamless access to explore mutation clusters derived from over 975,000 somatic mutations reported by 6,811 cancer sequencing studies. The mutation3D Web interface is freely available with all major browsers supported.
Collapse
Affiliation(s)
- Michael J Meyer
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, 14853.,Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, 14853.,Tri-Institutional Training Program in Computational Biology and Medicine, New York, New York, 10065
| | - Ryan Lapcevic
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, 14853.,Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, 14853
| | - Alfonso E Romero
- Department of Computer Science and Centre for Systems and Synthetic Biology, Royal Holloway, University of London, Egham TW20 0EX, UK
| | - Mark Yoon
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, 14853.,Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, 14853
| | - Jishnu Das
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, 14853.,Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, 14853
| | - Juan Felipe Beltrán
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, 14853.,Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, 14853
| | - Matthew Mort
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff CF14 4XN, UK
| | - Peter D Stenson
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff CF14 4XN, UK
| | - David N Cooper
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff CF14 4XN, UK
| | - Alberto Paccanaro
- Department of Computer Science and Centre for Systems and Synthetic Biology, Royal Holloway, University of London, Egham TW20 0EX, UK
| | - Haiyuan Yu
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, 14853.,Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, 14853
| |
Collapse
|
27
|
Chung IF, Chen CY, Su SC, Li CY, Wu KJ, Wang HW, Cheng WC. DriverDBv2: a database for human cancer driver gene research. Nucleic Acids Res 2015; 44:D975-9. [PMID: 26635391 PMCID: PMC4702919 DOI: 10.1093/nar/gkv1314] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2015] [Accepted: 11/10/2015] [Indexed: 11/30/2022] Open
Abstract
We previously presented DriverDB, a database that incorporates ∼6000 cases of exome-seq data, in addition to annotation databases and published bioinformatics algorithms dedicated to driver gene/mutation identification. The database provides two points of view, ‘Cancer’ and ‘Gene’, to help researchers visualize the relationships between cancers and driver genes/mutations. In the updated DriverDBv2 database (http://ngs.ym.edu.tw/driverdb) presented herein, we incorporated >9500 cancer-related RNA-seq datasets and >7000 more exome-seq datasets from The Cancer Genome Atlas (TCGA), International Cancer Genome Consortium (ICGC), and published papers. Seven additional computational algorithms (meaning that the updated database contains 15 in total), which were developed for driver gene identification, are incorporated into our analysis pipeline, and the results are provided in the ‘Cancer’ section. Furthermore, there are two main new features, ‘Expression’ and ‘Hotspot’, in the ‘Gene’ section. ‘Expression’ displays two expression profiles of a gene in terms of sample types and mutation types, respectively. ‘Hotspot’ indicates the hotspot mutation regions of a gene according to the results provided by four bioinformatics tools. A new function, ‘Gene Set’, allows users to investigate the relationships among mutations, expression levels and clinical data for a set of genes, a specific dataset and clinical features.
Collapse
Affiliation(s)
- I-Fang Chung
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei 11221, Taiwan Center for Systems and Synthetic Biology, National Yang-Ming University, Taipei, 11221, Taiwan
| | - Chen-Yang Chen
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei 11221, Taiwan
| | - Shih-Chieh Su
- Research Center for Tumour Medical Science, China Medical University, Taichung, 40402, Taiwan
| | - Chia-Yang Li
- Department of Genome Medicine, College of Medicine, Kaohsiung Medical University, Kaohsiung 80708, Taiwan Center for Infectious Disease and Cancer Research, Kaohsiung Medical University, Kaohsiung 80708, Taiwan
| | - Kou-Juey Wu
- Research Center for Tumour Medical Science, China Medical University, Taichung, 40402, Taiwan Graduate Institute of Cancer Biology, China Medical University, Taichung, 40402, Taiwan
| | - Hsei-Wei Wang
- VGH-YM Genomic Research Center, National Yang-Ming University, Taipei 11221, Taiwan Institute of Clinical Medicine, Medical College, National Yang-Ming University, Taipei 11221, Taiwan Institute of Microbiology and Immunology, National Yang-Ming University, Taipei 11221, Taiwan Department of Education and Research, Taipei City Hospital, Taipei 10341, Taiwan
| | - Wei-Chung Cheng
- Research Center for Tumour Medical Science, China Medical University, Taichung, 40402, Taiwan Graduate Institute of Cancer Biology, China Medical University, Taichung, 40402, Taiwan
| |
Collapse
|
28
|
Porta-Pardo E, Garcia-Alonso L, Hrabe T, Dopazo J, Godzik A. A Pan-Cancer Catalogue of Cancer Driver Protein Interaction Interfaces. PLoS Comput Biol 2015; 11:e1004518. [PMID: 26485003 PMCID: PMC4616621 DOI: 10.1371/journal.pcbi.1004518] [Citation(s) in RCA: 84] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2015] [Accepted: 08/21/2015] [Indexed: 12/19/2022] Open
Abstract
Despite their importance in maintaining the integrity of all cellular pathways, the role of mutations on protein-protein interaction (PPI) interfaces as cancer drivers has not been systematically studied. Here we analyzed the mutation patterns of the PPI interfaces from 10,028 proteins in a pan-cancer cohort of 5,989 tumors from 23 projects of The Cancer Genome Atlas (TCGA) to find interfaces enriched in somatic missense mutations. To that end we use e-Driver, an algorithm to analyze the mutation distribution of specific protein functional regions. We identified 103 PPI interfaces enriched in somatic cancer mutations. 32 of these interfaces are found in proteins coded by known cancer driver genes. The remaining 71 interfaces are found in proteins that have not been previously identified as cancer drivers even that, in most cases, there is an extensive literature suggesting they play an important role in cancer. Finally, we integrate these findings with clinical information to show how tumors apparently driven by the same gene have different behaviors, including patient outcomes, depending on which specific interfaces are mutated. Until now, most efforts in cancer genomics have focused on identifying genes and pathways driving tumor development. Although this has been unquestionably a success, as evidenced by the fact that we now have an extensive catalogue of cancer driver genes and pathways, there is still a poor understanding of why patients with the same affected driver genes may have different disease outcomes or drug responses. This is precisely the aim of this work-to show how by considering proteins as multifunctional factories instead of monolithic black boxes, it is possible to identify novel cancer driver genes and propose molecular hypotheses to explain such heterogeneity. To that end we have mapped the mutation profiles of 5,989 cancer patients from TCGA to more than 10,000 protein structures, leading us to identify 103 protein interaction interfaces enriched in somatic mutations. Finally, we have integrated clinical annotations as well as proteomics data to show how tumors apparently driven by the same gene can display different behaviors, including patient outcomes, depending on which specific interfaces are mutated.
Collapse
Affiliation(s)
- Eduard Porta-Pardo
- Bioinformatics and Systems Biology Program, Sanford-Burnham Medical Research Institute, La Jolla, California, United States of America
| | - Luz Garcia-Alonso
- European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Cambridge, United Kingdom
| | - Thomas Hrabe
- Bioinformatics and Systems Biology Program, Sanford-Burnham Medical Research Institute, La Jolla, California, United States of America
| | - Joaquin Dopazo
- Computational Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain
- Functional Genomics Node, (INB) at CIPF, Valencia, Spain
- Bioinformatics of Rare Diseases (BIER), CIBER de Enfermedades Raras (CIBERER), Valencia, Spain
- * E-mail: (JD); (AG)
| | - Adam Godzik
- Bioinformatics and Systems Biology Program, Sanford-Burnham Medical Research Institute, La Jolla, California, United States of America
- * E-mail: (JD); (AG)
| |
Collapse
|
29
|
Cheng F, Zhao J, Zhao Z. Advances in computational approaches for prioritizing driver mutations and significantly mutated genes in cancer genomes. Brief Bioinform 2015; 17:642-56. [PMID: 26307061 DOI: 10.1093/bib/bbv068] [Citation(s) in RCA: 91] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2015] [Indexed: 12/27/2022] Open
Abstract
Cancer is often driven by the accumulation of genetic alterations, including single nucleotide variants, small insertions or deletions, gene fusions, copy-number variations, and large chromosomal rearrangements. Recent advances in next-generation sequencing technologies have helped investigators generate massive amounts of cancer genomic data and catalog somatic mutations in both common and rare cancer types. So far, the somatic mutation landscapes and signatures of >10 major cancer types have been reported; however, pinpointing driver mutations and cancer genes from millions of available cancer somatic mutations remains a monumental challenge. To tackle this important task, many methods and computational tools have been developed during the past several years and, thus, a review of its advances is urgently needed. Here, we first summarize the main features of these methods and tools for whole-exome, whole-genome and whole-transcriptome sequencing data. Then, we discuss major challenges like tumor intra-heterogeneity, tumor sample saturation and functionality of synonymous mutations in cancer, all of which may result in false-positive discoveries. Finally, we highlight new directions in studying regulatory roles of noncoding somatic mutations and quantitatively measuring circulating tumor DNA in cancer. This review may help investigators find an appropriate tool for detecting potential driver or actionable mutations in rapidly emerging precision cancer medicine.
Collapse
|
30
|
Van den Eynden J, Fierro AC, Verbeke LPC, Marchal K. SomInaClust: detection of cancer genes based on somatic mutation patterns of inactivation and clustering. BMC Bioinformatics 2015; 16:125. [PMID: 25903787 PMCID: PMC4410004 DOI: 10.1186/s12859-015-0555-7] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2014] [Accepted: 03/30/2015] [Indexed: 12/15/2022] Open
Abstract
Background With the advances in high throughput technologies, increasing amounts of cancer somatic mutation data are being generated and made available. Only a small number of (driver) mutations occur in driver genes and are responsible for carcinogenesis, while the majority of (passenger) mutations do not influence tumour biology. In this study, SomInaClust is introduced, a method that accurately identifies driver genes based on their mutation pattern across tumour samples and then classifies them into oncogenes or tumour suppressor genes respectively. Results SomInaClust starts from the observation that oncogenes mainly contain mutations that, due to positive selection, cluster at similar positions in a gene across patient samples, whereas tumour suppressor genes contain a high number of protein-truncating mutations throughout the entire gene length. The method was shown to prioritize driver genes in 9 different solid cancers. Furthermore it was found to be complementary to existing similar-purpose methods with the additional advantages that it has a higher sensitivity, also for rare mutations (occurring in less than 1% of all samples), and it accurately classifies candidate driver genes in putative oncogenes and tumour suppressor genes. Pathway enrichment analysis showed that the identified genes belong to known cancer signalling pathways, and that the distinction between oncogenes and tumour suppressor genes is biologically relevant. Conclusions SomInaClust was shown to detect candidate driver genes based on somatic mutation patterns of inactivation and clustering and to distinguish oncogenes from tumour suppressor genes. The method could be used for the identification of new cancer genes or to filter mutation data for further data-integration purposes. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0555-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jimmy Van den Eynden
- Department of Information Technology, Ghent University - iMinds, Ghent, Belgium. .,Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium.
| | - Ana Carolina Fierro
- Department of Information Technology, Ghent University - iMinds, Ghent, Belgium.
| | - Lieven P C Verbeke
- Department of Information Technology, Ghent University - iMinds, Ghent, Belgium.
| | - Kathleen Marchal
- Department of Information Technology, Ghent University - iMinds, Ghent, Belgium. .,Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium.
| |
Collapse
|
31
|
Pon JR, Marra MA. Driver and Passenger Mutations in Cancer. ANNUAL REVIEW OF PATHOLOGY-MECHANISMS OF DISEASE 2015; 10:25-50. [DOI: 10.1146/annurev-pathol-012414-040312] [Citation(s) in RCA: 216] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Affiliation(s)
- Julia R. Pon
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, Canada V5Z 1L3;
| | - Marco A. Marra
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, Canada V5Z 1L3;
- Department of Medical Genetics, University of British Columbia, Vancouver, Canada V6T 1Z4;
| |
Collapse
|
32
|
Oleksiewicz U, Tomczak K, Woropaj J, Markowska M, Stępniak P, Shah PK. Computational characterisation of cancer molecular profiles derived using next generation sequencing. Contemp Oncol (Pozn) 2015; 19:A78-91. [PMID: 25691827 PMCID: PMC4322529 DOI: 10.5114/wo.2014.47137] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
Abstract
Our current understanding of cancer genetics is grounded on the principle that cancer arises from a clone that has accumulated the requisite somatically acquired genetic aberrations, leading to the malignant transformation. It also results in aberrent of gene and protein expression. Next generation sequencing (NGS) or deep sequencing platforms are being used to create large catalogues of changes in copy numbers, mutations, structural variations, gene fusions, gene expression, and other types of information for cancer patients. However, inferring different types of biological changes from raw reads generated using the sequencing experiments is algorithmically and computationally challenging. In this article, we outline common steps for the quality control and processing of NGS data. We highlight the importance of accurate and application-specific alignment of these reads and the methodological steps and challenges in obtaining different types of information. We comment on the importance of integrating these data and building infrastructure to analyse it. We also provide exhaustive lists of available software to obtain information and point the readers to articles comparing software for deeper insight in specialised areas. We hope that the article will guide readers in choosing the right tools for analysing oncogenomic datasets.
Collapse
Affiliation(s)
- Urszula Oleksiewicz
- Laboratory of Gene Therapy, Department of Cancer Immunology, The Greater Poland Cancer Centre, Poznan, Poland ; Department of Cancer Immunology and Diagnostics, Chair of Medical Biotechnology, Poznan University of Medical Sciences, Poznan, Poland ; These authors contributed equally to this paper
| | - Katarzyna Tomczak
- Laboratory of Gene Therapy, Department of Cancer Immunology, The Greater Poland Cancer Centre, Poznan, Poland ; Department of Cancer Immunology and Diagnostics, Chair of Medical Biotechnology, Poznan University of Medical Sciences, Poznan, Poland ; Postgraduate School of Molecular Medicine, Medical University of Warsaw, Warsaw ; These authors contributed equally to this paper
| | - Jakub Woropaj
- Poznan University of Economics, Poznań, Poland ; These authors contributed equally to this paper
| | | | | | - Parantu K Shah
- Institute for Applied Cancer Science, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| |
Collapse
|
33
|
Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes. Nat Genet 2014; 47:106-14. [PMID: 25501392 PMCID: PMC4444046 DOI: 10.1038/ng.3168] [Citation(s) in RCA: 576] [Impact Index Per Article: 57.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2014] [Accepted: 11/20/2014] [Indexed: 12/13/2022]
Abstract
Cancers exhibit extensive mutational heterogeneity and the resulting long tail
phenomenon complicates the discovery of the genes and pathways that are significantly
mutated in cancer. We perform a Pan-Cancer analysis of mutated networks in 3281 samples
from 12 cancer types from The Cancer Genome Atlas (TCGA) using HotNet2, a novel algorithm
to find mutated subnetworks that overcomes limitations of existing single gene and
pathway/network approaches.. We identify 14 significantly mutated subnetworks that include
well-known cancer signaling pathways as well as subnetworks with less characterized roles
in cancer including cohesin, condensin, and others. Many of these subnetworks exhibit
co-occurring mutations across samples. These subnetworks contain dozens of genes with rare
somatic mutations across multiple cancers; many of these genes have additional evidence
supporting a role in cancer. By illuminating these rare combinations of mutations,
Pan-Cancer network analyses provide a roadmap to investigate new diagnostic and
therapeutic opportunities across cancer types.
Collapse
|
34
|
Chen J, Sun M, Shen B. Deciphering oncogenic drivers: from single genes to integrated pathways. Brief Bioinform 2014; 16:413-28. [PMID: 25378434 DOI: 10.1093/bib/bbu039] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2014] [Accepted: 09/18/2014] [Indexed: 12/12/2022] Open
Abstract
Technological advances in next-generation sequencing have uncovered a wide spectrum of aberrations in cancer genomes. The extreme diversity in cancer mutations necessitates computational approaches to differentiate between the 'drivers' with vital function in cancer progression and those nonfunctional 'passengers'. Although individual driver mutations are routinely identified, mutational profiles of different tumors are highly heterogeneous. There is growing consensus that pathways rather than single genes are the primary target of mutations. Here we review extant bioinformatics approaches to identifying oncogenic drivers at different mutational levels, highlighting the strategies for discovering driver pathways and networks from cancer mutation data. These approaches will help reduce the mutation complexity, thus providing a simplified picture of cancer.
Collapse
|
35
|
Vuong H, Cheng F, Lin CC, Zhao Z. Functional consequences of somatic mutations in cancer using protein pocket-based prioritization approach. Genome Med 2014; 6:81. [PMID: 25360158 PMCID: PMC4213513 DOI: 10.1186/s13073-014-0081-7] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2014] [Accepted: 10/03/2014] [Indexed: 12/12/2022] Open
Abstract
Background Recently, a number of large-scale cancer genome sequencing projects have generated a large volume of somatic mutations; however, identifying the functional consequences and roles of somatic mutations in tumorigenesis remains a major challenge. Researchers have identified that protein pocket regions play critical roles in the interaction of proteins with small molecules, enzymes, and nucleic acid. As such, investigating the features of somatic mutations in protein pocket regions provides a promising approach to identifying new genotype-phenotype relationships in cancer. Methods In this study, we developed a protein pocket-based computational approach to uncover the functional consequences of somatic mutations in cancer. We mapped 1.2 million somatic mutations across 36 cancer types from the COSMIC database and The Cancer Genome Atlas (TCGA) onto the protein pocket regions of over 5,000 protein three-dimensional structures. We further integrated cancer cell line mutation profiles and drug pharmacological data from the Cancer Cell Line Encyclopedia (CCLE) onto protein pocket regions in order to identify putative biomarkers for anticancer drug responses. Results We found that genes harboring protein pocket somatic mutations were significantly enriched in cancer driver genes. Furthermore, genes harboring pocket somatic mutations tended to be highly co-expressed in a co-expressed protein interaction network. Using a statistical framework, we identified four putative cancer genes (RWDD1, NCF1, PLEK, and VAV3), whose expression profiles were associated with overall poor survival rates in melanoma, lung, or colorectal cancer patients. Finally, genes harboring protein pocket mutations were more likely to be drug-sensitive or drug-resistant. In a case study, we illustrated that the BAX gene was associated with the sensitivity of three anticancer drugs (midostaurin, vinorelbine, and tipifarnib). Conclusions This study provides novel insights into the functional consequences of somatic mutations during tumorigenesis and for anticancer drug responses. The computational approach used might be beneficial to the study of somatic mutations in the era of cancer precision medicine. Electronic supplementary material The online version of this article (doi:10.1186/s13073-014-0081-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Huy Vuong
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, 2525 West End Avenue, Suite 600, Nashville, TN 37203 USA
| | - Feixiong Cheng
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, 2525 West End Avenue, Suite 600, Nashville, TN 37203 USA
| | - Chen-Ching Lin
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, 2525 West End Avenue, Suite 600, Nashville, TN 37203 USA
| | - Zhongming Zhao
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, 2525 West End Avenue, Suite 600, Nashville, TN 37203 USA ; Department of Cancer Biology, Vanderbilt University School of Medicine, Nashville, TN 37232 USA ; Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, TN 37232 USA ; Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, TN 37232 USA
| |
Collapse
|
36
|
Abstract
High-throughput DNA sequencing has revolutionized the study of cancer genomics with numerous discoveries that are relevant to cancer diagnosis and treatment. The latest sequencing and analysis methods have successfully identified somatic alterations, including single-nucleotide variants, insertions and deletions, copy-number aberrations, structural variants and gene fusions. Additional computational techniques have proved useful for defining the mutations, genes and molecular networks that drive diverse cancer phenotypes and that determine clonal architectures in tumour samples. Collectively, these tools have advanced the study of genomic, transcriptomic and epigenomic alterations in cancer, and their association to clinical properties. Here, we review cancer genomics software and the insights that have been gained from their application.
Collapse
|
37
|
A spatial simulation approach to account for protein structure when identifying non-random somatic mutations. BMC Bioinformatics 2014; 15:231. [PMID: 24990767 PMCID: PMC4227039 DOI: 10.1186/1471-2105-15-231] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2013] [Accepted: 05/27/2014] [Indexed: 02/08/2023] Open
Abstract
Background Current research suggests that a small set of “driver” mutations are responsible for tumorigenesis while a larger body of “passenger” mutations occur in the tumor but do not progress the disease. Due to recent pharmacological successes in treating cancers caused by driver mutations, a variety of methodologies that attempt to identify such mutations have been developed. Based on the hypothesis that driver mutations tend to cluster in key regions of the protein, the development of cluster identification algorithms has become critical. Results We have developed a novel methodology, SpacePAC (Spatial Protein Amino acid Clustering), that identifies mutational clustering by considering the protein tertiary structure directly in 3D space. By combining the mutational data in the Catalogue of Somatic Mutations in Cancer (COSMIC) and the spatial information in the Protein Data Bank (PDB), SpacePAC is able to identify novel mutation clusters in many proteins such as FGFR3 and CHRM2. In addition, SpacePAC is better able to localize the most significant mutational hotspots as demonstrated in the cases of BRAF and ALK. The R package is available on Bioconductor at: http://www.bioconductor.org/packages/release/bioc/html/SpacePAC.html. Conclusion SpacePAC adds a valuable tool to the identification of mutational clusters while considering protein tertiary structure.
Collapse
|
38
|
Ryslik GA, Cheng Y, Cheung KH, Modis Y, Zhao H. A graph theoretic approach to utilizing protein structure to identify non-random somatic mutations. BMC Bioinformatics 2014; 15:86. [PMID: 24669769 PMCID: PMC4024121 DOI: 10.1186/1471-2105-15-86] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2013] [Accepted: 03/11/2014] [Indexed: 02/23/2023] Open
Abstract
Background It is well known that the development of cancer is caused by the accumulation of somatic mutations within the genome. For oncogenes specifically, current research suggests that there is a small set of "driver" mutations that are primarily responsible for tumorigenesis. Further, due to recent pharmacological successes in treating these driver mutations and their resulting tumors, a variety of approaches have been developed to identify potential driver mutations using methods such as machine learning and mutational clustering. We propose a novel methodology that increases our power to identify mutational clusters by taking into account protein tertiary structure via a graph theoretical approach. Results We have designed and implemented GraphPAC (Graph Protein Amino acid Clustering) to identify mutational clustering while considering protein spatial structure. Using GraphPAC, we are able to detect novel clusters in proteins that are known to exhibit mutation clustering as well as identify clusters in proteins without evidence of prior clustering based on current methods. Specifically, by utilizing the spatial information available in the Protein Data Bank (PDB) along with the mutational data in the Catalogue of Somatic Mutations in Cancer (COSMIC), GraphPAC identifies new mutational clusters in well known oncogenes such as EGFR and KRAS. Further, by utilizing graph theory to account for the tertiary structure, GraphPAC discovers clusters in DPP4, NRP1 and other proteins not identified by existing methods. The R package is available at:
http://bioconductor.org/packages/release/bioc/html/GraphPAC.html. Conclusion GraphPAC provides an alternative to iPAC and an extension to current methodology when identifying potential activating driver mutations by utilizing a graph theoretic approach when considering protein tertiary structure.
Collapse
Affiliation(s)
- Gregory A Ryslik
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA.
| | | | | | | | | |
Collapse
|
39
|
Raphael BJ, Dobson JR, Oesper L, Vandin F. Identifying driver mutations in sequenced cancer genomes: computational approaches to enable precision medicine. Genome Med 2014; 6:5. [PMID: 24479672 PMCID: PMC3978567 DOI: 10.1186/gm524] [Citation(s) in RCA: 131] [Impact Index Per Article: 13.1] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
High-throughput DNA sequencing is revolutionizing the study of cancer and enabling the measurement of the somatic mutations that drive cancer development. However, the resulting sequencing datasets are large and complex, obscuring the clinically important mutations in a background of errors, noise, and random mutations. Here, we review computational approaches to identify somatic mutations in cancer genome sequences and to distinguish the driver mutations that are responsible for cancer from random, passenger mutations. First, we describe approaches to detect somatic mutations from high-throughput DNA sequencing data, particularly for tumor samples that comprise heterogeneous populations of cells. Next, we review computational approaches that aim to predict driver mutations according to their frequency of occurrence in a cohort of samples, or according to their predicted functional impact on protein sequence or structure. Finally, we review techniques to identify recurrent combinations of somatic mutations, including approaches that examine mutations in known pathways or protein-interaction networks, as well as de novo approaches that identify combinations of mutations according to statistical patterns of mutual exclusivity. These techniques, coupled with advances in high-throughput DNA sequencing, are enabling precision medicine approaches to the diagnosis and treatment of cancer.
Collapse
Affiliation(s)
- Benjamin J Raphael
- Department of Computer Science, Brown University, 115 Waterman Street, Providence, RI 02912, USA
- Center for Computational Molecular Biology, Brown University, 115 Waterman Street, Providence, RI 02912, USA
| | - Jason R Dobson
- Department of Computer Science, Brown University, 115 Waterman Street, Providence, RI 02912, USA
- Center for Computational Molecular Biology, Brown University, 115 Waterman Street, Providence, RI 02912, USA
- Department of Molecular Biology, Cell Biology and Biochemistry, Brown University, 185 Meeting Street, Providence, RI 02912, USA
| | - Layla Oesper
- Department of Computer Science, Brown University, 115 Waterman Street, Providence, RI 02912, USA
| | - Fabio Vandin
- Department of Computer Science, Brown University, 115 Waterman Street, Providence, RI 02912, USA
- Center for Computational Molecular Biology, Brown University, 115 Waterman Street, Providence, RI 02912, USA
| |
Collapse
|